Prof Brian Ripley writes:
A couple of weeks back there was some discussion about documenting the
regular expressions as used in R. Several years ago the problem was
that this was OS-dependent, and to plug that problem we incorporated
regexp code from a version of GNU grep, later updated to grep-2.4.2 in
R 1.2.0.
I have been looking at documenting what grep(perl=TRUE) does, and we
have a similar problem in that the current PCRE, 4.4, implements
rather more of Perl's regexps than 3.9 (which is in 1.8.0 if the OS
does not supply it, and RH8.0 has PCRE 3.9. Whichever version of
Debian is on franz has PCRE 3.4).
I could add a configure check for PCRE >= 4.0, and I think probably
should do that. However, my inclination is to always use the version
of PCRE in the R sources and thereby ensure that all builds of R have
the same version, the one I will document. Comments, please.
I think we should in any case allow maintainers of binary packages on
platforms with advanced package management systems to force the use of
shared libraries the system can provide. (So the binary maintainers
would need to verify that the system package provides the right libs and
headers.)
Not sure about the default: we typically try to use available system
resources, unless this is bound to cause problems, and regex was of the
latter type, afaicr.
For PCRE 4.4 there is a long man page that I will use as a basis for
the documentation. I am inclined just to include either a text or PDF
version of the man page -- any preferences for which form?
Depends on where you would put the docs, I think. Btw, where can 4.4 be
found?
For the non-Perl regexps it is harder, as I am unsure exactly what
patterns the GNU regex we have accepts. (From a problem which
occurred with some Sweave regexps, I think it accepts more than it is
intended to.) One fairly good docu source is the GNU grep man page:
does anyone know a better one? I had thought of writing a regexp.Rd
help page to which grep.Rd could refer.
That would be great. Linux has a regex(7) purported to be "taken from
Henry Spencer's regex package", which might be used as a start. The old
GNU regex .tar.gz has a texinfo file, but does not help for what we
need, I think.
[I recently looked for available regexp docs, but was not too
successful.]
None of this is imminent (I am too busy) but is intended for the next
minor release (which may be called 1.9.0 or 2.0.0, I gather).