Skip to content

[R-pkg-devel] Accessing R's linked PCRE library from inside a package

4 messages · Dirk Eddelbuettel, Oliver Keyes

#
Hey all,

I'm trying to incorporate PCRE-compliant regular expressions into C
code in an R package.
much?) guarantees the presence of either a system-level PCRE library,
or an R-internal one.[0] Is this exposed (or grabbable) via the R C
API in any way?

Thanks!

Best,
Oliver

https://github.com/wch/r-source/blob/e5b21d0397c607883ff25cca379687b86933d730/src/main/grep.c#L75-L79
#
On 10 August 2016 at 18:15, Oliver Keyes wrote:
| I'm trying to incorporate PCRE-compliant regular expressions into C
| code in an R package.
| 
| >From digging around in R's source code, it appears that R (pretty
| much?) guarantees the presence of either a system-level PCRE library,
| or an R-internal one.[0] Is this exposed (or grabbable) via the R C
| API in any way?

The key to realize here is that R does indeed provide an environment.  And at
least where I like to work, in get this right off the bat:

    edd at max:/tmp$ grep lpcre /etc/R/*
    /etc/R/Makeconf:LIBS =  -lpcre -llzma -lbz2 -lz -lrt -ldl -lm
    edd at max:/tmp$ 

So pcre plus a bunch of compression libraries (lzma, bz2, z) and more are
essentially "there for the taking". If built as a shared library.

An existence proof is below; it is based on the 2nd Google hit I got for
'libpcre example' and has the advantge of being shorter than the first hit.

I first created baseline. The example, as given and then repaired, gets us:

    edd at max:/tmp$ ./ex_pcre 
     0: From:regular.expressions at example.com
     1: regular.expressions
     2: example.com
     0: From:exddd at 43434.com
     1: exddd
     2: 43434.com
     0: From:7853456 at exgem.com
     1: 7853456
     2: exgem.com
    edd at max:/tmp$ 

Turning that into something callable from R took about another minute. It
looks like this:

-----------------------------------------------------------------------------
// modified (and repaired) example from http://stackoverflow.com/a/1421923/143305
#include "pcre.h"
#include <Rcpp.h>

// [[Rcpp::export()]]
void foo() {
    const char *error;
    int   erroffset;
    pcre *re;
    int   rc;
    int   i;
    int   ovector[100];

    const char *regex = "From:([^@]+)@([^\r]+)";
    char str[]  = "From:regular.expressions at example.com\r\n"\
                  "From:exddd at 43434.com\r\n"\
                  "From:7853456 at exgem.com\r\n";

    re = pcre_compile (regex,          /* the pattern */
                       PCRE_MULTILINE,
                       &error,         /* for error message */
                       &erroffset,     /* for error offset */
                       0);             /* use default character tables */
    if (!re) Rcpp::stop("pcre_compile failed (offset: %d), %s\n", erroffset, error);

    unsigned int offset = 0;
    unsigned int len    = strlen(str);
    while (offset < len && (rc = pcre_exec(re, 0, str, len, offset, 0, ovector, sizeof(ovector))) >= 0) {
        for(int i = 0; i < rc; ++i) {
            Rprintf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str + ovector[2*i]);
        }
        offset = ovector[1];
    }
}

/*** R
foo()
*/
-----------------------------------------------------------------------------

and, lo and behold, produces the same output demonstrating that, yes,
Veronica, we do get pcre for free:

    R> library(Rcpp)
    R> sourceCpp("/tmp/oliver.cpp")
    
    R> foo()
     0: From:regular.expressions at example.com
     1: regular.expressions
     2: example.com
     0: From:exddd at 43434.com
     1: exddd
     2: 43434.com
     0: From:7853456 at exgem.com
     1: 7853456
     2: exgem.com
    R> 

Your package will probably want to a litmus test in configure to see if this
really holds on the platform it is currently being built on.

Dirk
#
Neat; thanks Dirk! Will be interesting to see if I can get that finnagled
on Windows when I get back to Boston.

Best,
Oliver
On Wednesday, 10 August 2016, Dirk Eddelbuettel <edd at debian.org> wrote:

            

  
  
#
On 10 August 2016 at 19:35, Oliver Keyes wrote:
| Neat; thanks Dirk! Will be interesting to see if I can get that finnagled on
| Windows when I get back to Boston.

Come to think about it, there is a bit of good fortune in my use as I

 - don't need to bother with include flags as pcre as a 'system library'
   so header are found
   
 - don't need to worry about version skew as there is only one pcre version
   on the system, and the header and library match

which will not generally be true in other, less developer-friendly environs.
But hey, that's your problem and not mine :)  

Dirk