An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111115/705b4df5/attachment.pl>
Regular expressions in R
4 messages · Sarah Goslee, Joshua Wiley, Michael Griffiths
Hi Michael,
You need to take another look at the examples you were given, and at
the help for ?sub():
The two ?*sub? functions differ only in that ?sub? replaces only
the first occurrence of a ?pattern? whereas ?gsub? replaces all
occurrences. If ?replacement? contains backreferences which are
not defined in ?pattern? the result is undefined (but most often
the backreference is taken to be ?""?).
Sarah
On Tue, Nov 15, 2011 at 12:18 PM, Michael Griffiths
<griffiths at upstreamsystems.com> wrote:
Good afternoon list,
I have the following character strings; one with spaces between the maths
operators and variable names, and one without said spaces.
form<-c('~ Sentence + LEGAL + Intro + Intro / Intro1 + Intro * LEGAL +
benefit + benefit / benefit1 + product + action * mean + CTA + help + mean
* product')
form<-c('~Sentence+LEGAL+Intro+Intro/Intro1+Intro*LEGAL+benefit+benefit/benefit1+product+action*mean+CTA+help+mean*product')
I would like to remove the following target strings, either:
1. '+ Intro * LEGAL' which is ?'+ space name space * space name'
2. '+Intro*LEGAL' which is ?'+ nospace name nospace * nospace name'
Having delved into a variety of sites (e.g.
http://www.zytrax.com/tech/web/regex.htm#search) investigating regular
expressions I now have a basic grasp, but I am having difficulties removing
ALL of the instances or 1. or 2.
The code below removes just a SINGLE instance of the target string, but I
was expecting it to remove all instances as I have \\*.[[allnum]]. I did
try \\*.[[allnum]]*, but this did not work.
form<-sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form)
I am obviously still not understanding something. If the list could offer
some guidance I would be most grateful.
Regards
Mike Griffiths
Sarah Goslee http://www.functionaldiversity.org
Hi Michael,
Your strings were long so I made a bit smaller example. Sarah made
one good point, you want to be using gsub() not sub(), but when I use
your code, I do not think it even works precisely for one instance.
Try this on for size, you were 99% there:
## simplified cases
form1 <- c('product + action * mean + CTA + help + mean * product')
form2 <- c('product+action*mean+CTA+help+mean*product')
## what I believe your desired output is
'product + CTA + help'
'product+CTA+help'
gsub("\\s\\+\\s[[:alnum:]]*\\s\\*\\s[[:alnum:]]*", "", form1)
gsub("\\+[[:alnum:]]*\\*[[:alnum:]]*", "", form2)
## your code (using gsub() instead of sub())
gsub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form1)
######## Running on r57586 Windows x64 ########
gsub("\\s\\+\\s[[:alnum:]]*\\s\\*\\s[[:alnum:]]*", "", form1)
[1] "product + CTA + help"
gsub("\\+[[:alnum:]]*\\*[[:alnum:]]*", "", form2)
[1] "product+CTA+help"
## your code (using gsub() instead of sub())
gsub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form1)
[1] "product ean + CTA + help roduct" Hope this helps, Josh On Tue, Nov 15, 2011 at 9:18 AM, Michael Griffiths
<griffiths at upstreamsystems.com> wrote:
Good afternoon list,
I have the following character strings; one with spaces between the maths
operators and variable names, and one without said spaces.
form<-c('~ Sentence + LEGAL + Intro + Intro / Intro1 + Intro * LEGAL +
benefit + benefit / benefit1 + product + action * mean + CTA + help + mean
* product')
form<-c('~Sentence+LEGAL+Intro+Intro/Intro1+Intro*LEGAL+benefit+benefit/benefit1+product+action*mean+CTA+help+mean*product')
I would like to remove the following target strings, either:
1. '+ Intro * LEGAL' which is ?'+ space name space * space name'
2. '+Intro*LEGAL' which is ?'+ nospace name nospace * nospace name'
Having delved into a variety of sites (e.g.
http://www.zytrax.com/tech/web/regex.htm#search) investigating regular
expressions I now have a basic grasp, but I am having difficulties removing
ALL of the instances or 1. or 2.
The code below removes just a SINGLE instance of the target string, but I
was expecting it to remove all instances as I have \\*.[[allnum]]. I did
try \\*.[[allnum]]*, but this did not work.
form<-sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form)
I am obviously still not understanding something. If the list could offer
some guidance I would be most grateful.
Regards
Mike Griffiths
--
*Michael Griffiths, Ph.D
*Statistician
*Upstream Systems*
8th Floor
Portland House
Bressenden Place
SW1E 5BH
<http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw>
Tel ? +44 (0) 20 7869 5147
Fax ?+44 207 290 1321
Mob +44 789 4944 145
www.upstreamsystems.com<http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw>
*griffiths at upstreamsystems.com <einstein at upstreamsystems.com>*
<http://www.upstreamsystems.com/>
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111116/27c9eb93/attachment.pl>