Skip to content

Must be obvious but not to me : problem with regular expression

3 messages · Ptit_Bleu, Duncan Murdoch, Uwe Ligges

#
Hi,

I have a vector called nfichiers of 138 names of file whose extension is .P0
or P1 ... to P8.
The script is not the same when the extension is P0 or P(1 to 8).

Examples of file names :
[128] "Output0.P0"       
[129] "Output0.P1"       
[130] "Output0.P2"       
[131] "Output01102007.P0"
[132] "Output01102007.P1"
[133] "Output01102007.P2"
[134] "Output01102007.P3"
[135] "Output01102007.P4"


To extract the names of file with .P0 extension I wrote :
nfichiers[grep(".P0", nfichiers)]
For the other extensions :
nfichiers[grep(".P[^0]", nfichiers)]

But for the last, I get a length of 138 that is the length of the initial
vector although I have 130 files with .P0 extension.

So I tried "manually" with a small vector :
[1] "aa.P0" "bb.P0" "cc.P1" "dd.P2"
[1] "cc.P1" "dd.P2"

It works !!!

Has someone an idea to solve this small problem ?
Thanks in advance,
Ptit Bleu.
#
On 12/17/2007 9:34 AM, Ptit_Bleu wrote:
One problem above is that "." is special in regular expressions.  I'd 
also suggest adding $ at the end, to force the match to the end of the 
string.  That is, code as

grep("\\.P0$", nfichiers)

and

grep("\\.P[^0]$", nfichiers)

I don't know what false matches you were seeing, but this should 
eliminate some.

Duncan Murdoch
#
Ptit_Bleu wrote:
I guess you want
     grep("\\.P0$", nfichiers)
Otherwise you get "XP0X" as a positive as well.

And for the others:
   grep("\\.P[^0]$", nfichiers)
with ".P[^0]", you'd get "XPXX" as positive, for example...
because you are looking for something that contains a P that is preceded 
by any character and followed by some non-zero character.

Uwe Ligges