Skip to content

Extracting matched expressions

4 messages · Hadley Wickham, jim holtman, Gabor Grothendieck

#
Hi all,

Is there a tool in base R to extract matched expressions from a
regular expression?  i.e. given the regular expression "(.*?) (.*?)
([ehtr]{5})" is there a way to extract the character vector c("one",
"two", "three") from the string "one two three" ?

Thanks,

Hadley
#
Is this what you want:
+     "\\1 \\2 \\3", x, perl=TRUE)
[1] "one"   "two"   "three"
On Sun, Nov 8, 2009 at 1:51 PM, Hadley Wickham <hadley at rice.edu> wrote:

  
    
#
strapply in the gsubfn package can do that. It applies the indicated
function, here just c, to the back references from the pattern match
and then simplifies the result using simplify. (If you omit simplify
here it would give a one element list like strsplit does.)

library(gsubfn)
pat <- "(.*?) (.*?) ([ehtr]{5})"
strapply("one two three", pat, c, simplify = c)

See home page at: http://gsubfn.googlecode.com
On Sun, Nov 8, 2009 at 1:51 PM, Hadley Wickham <hadley at rice.edu> wrote:
#
Thanks Jim - it's not elegant, but it works.  Instead of using space
as a delimiter, I used "\u001E" - it's the unicode record delimiter
character, and I figure there's less chance of a clash with a
character in the match.

Hadley
On Sun, Nov 8, 2009 at 1:40 PM, jim holtman <jholtman at gmail.com> wrote: