Regular expressions in R
Hi Michael,
Your strings were long so I made a bit smaller example. Sarah made
one good point, you want to be using gsub() not sub(), but when I use
your code, I do not think it even works precisely for one instance.
Try this on for size, you were 99% there:
## simplified cases
form1 <- c('product + action * mean + CTA + help + mean * product')
form2 <- c('product+action*mean+CTA+help+mean*product')
## what I believe your desired output is
'product + CTA + help'
'product+CTA+help'
gsub("\\s\\+\\s[[:alnum:]]*\\s\\*\\s[[:alnum:]]*", "", form1)
gsub("\\+[[:alnum:]]*\\*[[:alnum:]]*", "", form2)
## your code (using gsub() instead of sub())
gsub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form1)
######## Running on r57586 Windows x64 ########
gsub("\\s\\+\\s[[:alnum:]]*\\s\\*\\s[[:alnum:]]*", "", form1)
[1] "product + CTA + help"
gsub("\\+[[:alnum:]]*\\*[[:alnum:]]*", "", form2)
[1] "product+CTA+help"
## your code (using gsub() instead of sub())
gsub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form1)
[1] "product ean + CTA + help roduct" Hope this helps, Josh On Tue, Nov 15, 2011 at 9:18 AM, Michael Griffiths
<griffiths at upstreamsystems.com> wrote:
Good afternoon list,
I have the following character strings; one with spaces between the maths
operators and variable names, and one without said spaces.
form<-c('~ Sentence + LEGAL + Intro + Intro / Intro1 + Intro * LEGAL +
benefit + benefit / benefit1 + product + action * mean + CTA + help + mean
* product')
form<-c('~Sentence+LEGAL+Intro+Intro/Intro1+Intro*LEGAL+benefit+benefit/benefit1+product+action*mean+CTA+help+mean*product')
I would like to remove the following target strings, either:
1. '+ Intro * LEGAL' which is ?'+ space name space * space name'
2. '+Intro*LEGAL' which is ?'+ nospace name nospace * nospace name'
Having delved into a variety of sites (e.g.
http://www.zytrax.com/tech/web/regex.htm#search) investigating regular
expressions I now have a basic grasp, but I am having difficulties removing
ALL of the instances or 1. or 2.
The code below removes just a SINGLE instance of the target string, but I
was expecting it to remove all instances as I have \\*.[[allnum]]. I did
try \\*.[[allnum]]*, but this did not work.
form<-sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form)
I am obviously still not understanding something. If the list could offer
some guidance I would be most grateful.
Regards
Mike Griffiths
--
*Michael Griffiths, Ph.D
*Statistician
*Upstream Systems*
8th Floor
Portland House
Bressenden Place
SW1E 5BH
<http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw>
Tel ? +44 (0) 20 7869 5147
Fax ?+44 207 290 1321
Mob +44 789 4944 145
www.upstreamsystems.com<http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw>
*griffiths at upstreamsystems.com <einstein at upstreamsystems.com>*
<http://www.upstreamsystems.com/>
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/