Skip to content

Versions of PCRE, documenting what grep etc do.

5 messages · Brian Ripley, Dirk Eddelbuettel, Kurt Hornik

#
A couple of weeks back there was some discussion about documenting the 
regular expressions as used in R.   Several years ago the problem was that 
this was OS-dependent, and to plug that problem we incorporated regexp 
code from a version of GNU grep, later updated to grep-2.4.2 in R 1.2.0.

I have been looking at documenting what grep(perl=TRUE) does, and we
have a similar problem in that the current PCRE, 4.4, implements rather 
more of Perl's regexps than 3.9 (which is in 1.8.0 if the OS does not 
supply it, and RH8.0 has PCRE 3.9. Whichever version of Debian is on franz 
has PCRE 3.4).

I could add a configure check for PCRE >= 4.0, and I think probably should 
do that.  However, my inclination is to always use the version of PCRE in 
the R sources  and thereby ensure that all builds of R have the same 
version, the one I will document.  Comments, please.

For PCRE 4.4 there is a long man page that I will use as a basis for the 
documentation.  I am inclined just to include either a text or PDF version 
of the man page -- any preferences for which form?


For the non-Perl regexps it is harder, as I am unsure exactly what
patterns the GNU regex we have accepts.  (From a problem which occurred
with some Sweave regexps, I think it accepts more than it is intended
to.)  One fairly good docu source is the GNU grep man page: does anyone
know a better one?   I had thought of writing a regexp.Rd help page to 
which grep.Rd could refer.

None of this is imminent (I am too busy) but is intended for the next 
minor release (which may be called 1.9.0 or 2.0.0, I gather).

Brian
#
On Fri, Oct 24, 2003 at 07:46:41AM +0100, Prof Brian Ripley wrote:
FWIW the current line of R (>= 1.8.0) in Debian unstable has 

    Depends: [....] libpcre3 (>= 4.0) [...]
    
by virtue of the fact that the pcre libraries in Debian unstable are
currently at version 4.3. 

Dirk
#
I think we should in any case allow maintainers of binary packages on
platforms with advanced package management systems to force the use of
shared libraries the system can provide.  (So the binary maintainers
would need to verify that the system package provides the right libs and
headers.)

Not sure about the default: we typically try to use available system
resources, unless this is bound to cause problems, and regex was of the
latter type, afaicr.
Depends on where you would put the docs, I think.  Btw, where can 4.4 be
found?
That would be great.  Linux has a regex(7) purported to be "taken from
Henry Spencer's regex package", which might be used as a start.  The old
GNU regex .tar.gz has a texinfo file, but does not help for what we
need, I think.

[I recently looked for available regexp docs, but was not too
successful.]
Too bad :-(

Best
-k
#
On Fri, 24 Oct 2003, Kurt Hornik wrote:

            
With a configure check for >= 4.0 I am reasonably happy to have 
--without-pcre as the default and allow --with-pcre at people's peril.
At the ftp site mentioned on ?grep, at least earlier this week.
The GNU grep 2.4.2 man page and texinfo file give me enough, except I 
don't understand them well enough.  (What is said about extended vs basic 
expressions is unclear at best).

The Solaris 8 man pages are better and they do document POSIX regexps,
so I will use some of their ideas.
I might try to put regexp.Rd (I have a start) in 1.8.1 then.  Bu thte PCRE 
stuff will need to wait for R-devel's release.

Brian
#
I have added a preliminary help page for regex to R-patched which should
help for now, and added a configure test for PCRE >= 4.0 to R-devel.

I will return to this later in 2003.

Brian
On Fri, 24 Oct 2003, Kurt Hornik wrote: