text matching

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110919/1543be48/attachment.pl>
Hi,
Hi All,

I have a character vector by name tickers

head(tickers,10)
? ? ? ? ? ?V1
1 ?ADARSHPL.BO
2 ? ? ? ?AGR.V
3 ? ? ? ? ?AGU
4 ? ? ? AGU.TO
5 ? ? AIMCO.BO
6 ?ALUFLUOR.BO
7 ? ? ? ?AMZ.V
8 ? ? ? ? ?AVD
9 ?ANILPROD.BO
10 ? ?ARIES.BO

I would like to extract all elements that has ".BO" in it. I tried

grep("\.BO",tickers)
Error: '\.' is an unrecognized escape in character string starting "\."
You need instead:
tickers <- c("A.BO", "BOB", "C.BO")
grep("\\.BO", tickers)
[1] 1 3
tickers[grep("\\.BO", tickers)]
[1] "A.BO" "C.BO"
grep(".BO",tickers)
[1] 1
That's odd; it should have returned many more matches. You may need to
check the format of your data.
Could any one please guide me on this. Many thanks for the help

Best Regards,

Krishna

Sarah Goslee
http://www.functionaldiversity.org

Hi,

On Mon, Sep 19, 2011 at 6:15 AM, SNV Krishna <krishna at primps.com.sg>  
wrote:
Hi All,

I have a character vector by name tickers

head(tickers,10)
           V1
1  ADARSHPL.BO
2        AGR.V
3          AGU
4       AGU.TO
5     AIMCO.BO
6  ALUFLUOR.BO
7        AMZ.V
8          AVD
9  ANILPROD.BO
10    ARIES.BO

I would like to extract all elements that has ".BO" in it. I tried

grep("\.BO",tickers)
Error: '\.' is an unrecognized escape in character string starting  
"\."
You need instead:
tickers <- c("A.BO", "BOB", "C.BO")
grep("\\.BO", tickers)
[1] 1 3
tickers[grep("\\.BO", tickers)]
[1] "A.BO" "C.BO"

grep(".BO",tickers)
[1] 1
That's odd; it should have returned many more matches. You may need to
check the format of your data.
There are two NOT-oddities at work here. Periods and other special  
characters need to be doubly escaped when used as literals in search   
patterns,  and the vector that needs to be searched is not "tickers"  
but rather "tickers$V1".

That result is because there is only one element in the list named  
"tickers" and grep finds that it does have an instance that matches  
the pattern. (Despite that fact that it is not searching what the OP  
thought he was searching for but rather a more general pattern.)
David.

>
>> Could any one please guide me on this. Many thanks for the help
>>
>> Best Regards,
>>
>> Krishna
>>
>
>
> -- 
> Sarah Goslee
> http://www.functionaldiversity.org
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT
Hi,

I noticed the mistake, first thing is double escape, so it should be "\\.BO"
instead of "\.BO" . Second and more important observation is tickers$V1.
Thanks for pointing out David and thank you all for the help.

Best regards,

Krishna

-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net] 
Sent: Monday, September 19, 2011 10:10 PM
To: Sarah Goslee
Cc: SNV Krishna; r-help at r-project.org
Subject: Re: [R] text matching

Hi,

On Mon, Sep 19, 2011 at 6:15 AM, SNV Krishna <krishna at primps.com.sg>
wrote:
Hi All,

I have a character vector by name tickers

head(tickers,10)
           V1
1  ADARSHPL.BO
2        AGR.V
3          AGU
4       AGU.TO
5     AIMCO.BO
6  ALUFLUOR.BO
7        AMZ.V
8          AVD
9  ANILPROD.BO
10    ARIES.BO

I would like to extract all elements that has ".BO" in it. I tried

grep("\.BO",tickers)
Error: '\.' is an unrecognized escape in character string starting 
"\."
You need instead:
tickers <- c("A.BO", "BOB", "C.BO")
grep("\\.BO", tickers)
[1] 1 3
tickers[grep("\\.BO", tickers)]
[1] "A.BO" "C.BO"

grep(".BO",tickers)
[1] 1
That's odd; it should have returned many more matches. You may need to 
check the format of your data.
There are two NOT-oddities at work here. Periods and other special  
characters need to be doubly escaped when used as literals in search   
patterns,  and the vector that needs to be searched is not "tickers"  
but rather "tickers$V1".

That result is because there is only one element in the list named "tickers"
and grep finds that it does have an instance that matches the pattern.
(Despite that fact that it is not searching what the OP thought he was
searching for but rather a more general pattern.)

--
David.

Could any one please guide me on this. Many thanks for the help

Best Regards,

Krishna

-- 
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
Hi,

The "str_locate" function instringr package may do what you are looking for.
Hope this link will help...

http://en.wikibooks.org/wiki/R_Programming/Text_Processing

Taka
Hi All,

I have a character vector by name tickers

head(tickers,10)
? ? ? ? ? ?V1
1 ?ADARSHPL.BO
2 ? ? ? ?AGR.V
3 ? ? ? ? ?AGU
4 ? ? ? AGU.TO
5 ? ? AIMCO.BO
6 ?ALUFLUOR.BO
7 ? ? ? ?AMZ.V
8 ? ? ? ? ?AVD
9 ?ANILPROD.BO
10 ? ?ARIES.BO

I would like to extract all elements that has ".BO" in it. I tried

grep("\.BO",tickers)
Error: '\.' is an unrecognized escape in character string starting "\."

grep(".BO",tickers)
[1] 1

Could any one please guide me on this. Many thanks for the help

Best Regards,

Krishna

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.