Skip to content

Regular expressions in R

4 messages · Sarah Goslee, Joshua Wiley, Michael Griffiths

#
Hi Michael,

You need to take another look at the examples you were given, and at
the help for ?sub():

     The two ?*sub? functions differ only in that ?sub? replaces only
     the first occurrence of a ?pattern? whereas ?gsub? replaces all
     occurrences.  If ?replacement? contains backreferences which are
     not defined in ?pattern? the result is undefined (but most often
     the backreference is taken to be ?""?).

Sarah

On Tue, Nov 15, 2011 at 12:18 PM, Michael Griffiths
<griffiths at upstreamsystems.com> wrote:
#
Hi Michael,

Your strings were long so I made a bit smaller example.  Sarah made
one good point, you want to be using gsub() not sub(), but when I use
your code, I do not think it even works precisely for one instance.
Try this on for size, you were 99% there:

## simplified cases
form1 <- c('product + action * mean + CTA + help + mean * product')
form2 <- c('product+action*mean+CTA+help+mean*product')

## what I believe your desired output is
'product + CTA + help'
'product+CTA+help'

gsub("\\s\\+\\s[[:alnum:]]*\\s\\*\\s[[:alnum:]]*", "", form1)
gsub("\\+[[:alnum:]]*\\*[[:alnum:]]*", "", form2)

## your code (using gsub() instead of sub())
gsub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form1)


######## Running on r57586 Windows x64 ########
[1] "product + CTA + help"
[1] "product+CTA+help"
[1] "product ean + CTA + help roduct"

Hope this helps,

Josh

On Tue, Nov 15, 2011 at 9:18 AM, Michael Griffiths
<griffiths at upstreamsystems.com> wrote: