Skip to content
Prev 240503 / 398500 Next

Regular Expressions

On Thu, 4 Nov 2010, Noah Silverman wrote:

            
Read the the link to ?regexp.  It *does* 'indicate the way to capture 
things'.

      The backreference ?\N?, where ?N = 1 ... 9?, matches the substring
      previously matched by the Nth parenthesized subexpression of the
      regular expression.  (This is an extension for extended regular
      expressions: POSIX defines them only for basic ones.)

and there is an example on the help page for grep():

      ## Double all 'a' or 'b's;  "\" must be escaped, i.e., 'doubled'
      gsub("([ab])", "\\1_\\1_", "abc and ABC")

In your example

x <- "10 Nov 13.00 (PFE1020K13)"
regex <- "(\\d\\d)\\s(\\w\\w\\w).*"
sub(regex, "\\1", x, perl = TRUE)
sub(regex, "\\2", x, perl = TRUE)

A better way to do this would be something like

regex <- "([[:digit:]]{2})\\s([[:alpha:]]{3}).*"

which is also a POSIX extended regexp.