An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110919/1543be48/attachment.pl>
text matching
5 messages · Sarah Goslee, David Winsemius, Krishna Kumar +1 more
Hi,
On Mon, Sep 19, 2011 at 6:15 AM, SNV Krishna <krishna at primps.com.sg> wrote:
Hi All, I have a character vector by name tickers
head(tickers,10)
? ? ? ? ? ?V1 1 ?ADARSHPL.BO 2 ? ? ? ?AGR.V 3 ? ? ? ? ?AGU 4 ? ? ? AGU.TO 5 ? ? AIMCO.BO 6 ?ALUFLUOR.BO 7 ? ? ? ?AMZ.V 8 ? ? ? ? ?AVD 9 ?ANILPROD.BO 10 ? ?ARIES.BO I would like to extract all elements that has ".BO" in it. I tried
grep("\.BO",tickers)
Error: '\.' is an unrecognized escape in character string starting "\."
You need instead:
tickers <- c("A.BO", "BOB", "C.BO")
grep("\\.BO", tickers)
[1] 1 3
tickers[grep("\\.BO", tickers)]
[1] "A.BO" "C.BO"
grep(".BO",tickers)
[1] 1
That's odd; it should have returned many more matches. You may need to check the format of your data.
Could any one please guide me on this. Many thanks for the help Best Regards, Krishna
Sarah Goslee http://www.functionaldiversity.org
On Sep 19, 2011, at 7:05 AM, Sarah Goslee wrote:
Hi, On Mon, Sep 19, 2011 at 6:15 AM, SNV Krishna <krishna at primps.com.sg> wrote:
Hi All, I have a character vector by name tickers
head(tickers,10)
V1
1 ADARSHPL.BO
2 AGR.V
3 AGU
4 AGU.TO
5 AIMCO.BO
6 ALUFLUOR.BO
7 AMZ.V
8 AVD
9 ANILPROD.BO
10 ARIES.BO
I would like to extract all elements that has ".BO" in it. I tried
grep("\.BO",tickers)
Error: '\.' is an unrecognized escape in character string starting "\."
You need instead:
tickers <- c("A.BO", "BOB", "C.BO")
grep("\\.BO", tickers)
[1] 1 3
tickers[grep("\\.BO", tickers)]
[1] "A.BO" "C.BO"
grep(".BO",tickers)
[1] 1
That's odd; it should have returned many more matches. You may need to check the format of your data.
There are two NOT-oddities at work here. Periods and other special characters need to be doubly escaped when used as literals in search patterns, and the vector that needs to be searched is not "tickers" but rather "tickers$V1". That result is because there is only one element in the list named "tickers" and grep finds that it does have an instance that matches the pattern. (Despite that fact that it is not searching what the OP thought he was searching for but rather a more general pattern.)
David. > >> Could any one please guide me on this. Many thanks for the help >> >> Best Regards, >> >> Krishna >> > > > -- > Sarah Goslee > http://www.functionaldiversity.org > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT
Hi, I noticed the mistake, first thing is double escape, so it should be "\\.BO" instead of "\.BO" . Second and more important observation is tickers$V1. Thanks for pointing out David and thank you all for the help. Best regards, Krishna -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Monday, September 19, 2011 10:10 PM To: Sarah Goslee Cc: SNV Krishna; r-help at r-project.org Subject: Re: [R] text matching
On Sep 19, 2011, at 7:05 AM, Sarah Goslee wrote:
Hi, On Mon, Sep 19, 2011 at 6:15 AM, SNV Krishna <krishna at primps.com.sg> wrote:
Hi All, I have a character vector by name tickers
head(tickers,10)
V1
1 ADARSHPL.BO
2 AGR.V
3 AGU
4 AGU.TO
5 AIMCO.BO
6 ALUFLUOR.BO
7 AMZ.V
8 AVD
9 ANILPROD.BO
10 ARIES.BO
I would like to extract all elements that has ".BO" in it. I tried
grep("\.BO",tickers)
Error: '\.' is an unrecognized escape in character string starting "\."
You need instead:
tickers <- c("A.BO", "BOB", "C.BO")
grep("\\.BO", tickers)
[1] 1 3
tickers[grep("\\.BO", tickers)]
[1] "A.BO" "C.BO"
grep(".BO",tickers)
[1] 1
That's odd; it should have returned many more matches. You may need to check the format of your data.
There are two NOT-oddities at work here. Periods and other special characters need to be doubly escaped when used as literals in search patterns, and the vector that needs to be searched is not "tickers" but rather "tickers$V1". That result is because there is only one element in the list named "tickers" and grep finds that it does have an instance that matches the pattern. (Despite that fact that it is not searching what the OP thought he was searching for but rather a more general pattern.) -- David.
Could any one please guide me on this. Many thanks for the help Best Regards, Krishna
-- Sarah Goslee http://www.functionaldiversity.org
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD West Hartford, CT
Hi, The "str_locate" function instringr package may do what you are looking for. Hope this link will help... http://en.wikibooks.org/wiki/R_Programming/Text_Processing Taka
On Mon, Sep 19, 2011 at 7:15 PM, SNV Krishna <krishna at primps.com.sg> wrote:
Hi All, I have a character vector by name tickers
head(tickers,10)
? ? ? ? ? ?V1 1 ?ADARSHPL.BO 2 ? ? ? ?AGR.V 3 ? ? ? ? ?AGU 4 ? ? ? AGU.TO 5 ? ? AIMCO.BO 6 ?ALUFLUOR.BO 7 ? ? ? ?AMZ.V 8 ? ? ? ? ?AVD 9 ?ANILPROD.BO 10 ? ?ARIES.BO I would like to extract all elements that has ".BO" in it. I tried
grep("\.BO",tickers)
Error: '\.' is an unrecognized escape in character string starting "\."
grep(".BO",tickers)
[1] 1 Could any one please guide me on this. Many thanks for the help Best Regards, Krishna ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.