Skip to content

regular expression for selection

9 messages · Baptiste Auguie, Jim Lemon, Uwe Ligges +2 more

#
Dear all

I am again (as usual) lost in regular expression use for selection. Here 
are my data:
c("138516_10g_50ml_50c_250utes1_m53.00-_s1.imp", 
"138516_10g_50ml_50c_250utes1_m54.00_s1.imp", 
"138516_10g_50ml_50c_250utes1_m55.00_s1.imp", 
"138516_10g_50ml_50c_250utes1_m56.00_s1.imp", 
"138516_10g_50ml_50c_250utes1_m57.00_s1.imp", 
"138516_10g_50ml_50c_250utes1_m58.00_s1.imp", 
"138516_10g_50ml_50c_250utes1_m59.00_s1.imp")

I want to select only values "m" foolowed by numbers from 53 to 59.

I used

sub("m5.", "", mena)

which correctly selects those m53 - m59 values but, in contrary to my 
expectation, it replaced the selected values with specified replacement - 
in that case empty string. 

What I shall use if I want to get rid of all but m53-m59 from those 
strings?

Regards
Petr
#
Hi,

Try grepl instead of sub,

mena[grepl("m5.", mena)]

HTH,

baptiste
On 14 November 2011 21:45, Petr PIKAL <petr.pikal at precheza.cz> wrote:
#
On 11/14/2011 07:45 PM, Petr PIKAL wrote:
Hi Petr,
How about:

grep("m5",mena)

Jim
#
Hi
It does not select those "m5?" strings from those character vectors. I 
need as an output a vector

m53, m54, m55, m56, m57, m58, m59

Regards
Petr
Here
replacement -
http://www.R-project.org/posting-guide.html
#
Hi
Here
replacement -
It gives numeric values which tells me that there is a match in each 
string, but as a result I need only

m53-m59 substrings.

Regards
Petr
#
On 14.11.2011 10:22, Petr PIKAL wrote:
gsub(".*_(m5.).*", "\\1", mena)

Uwe Ligges
#
Does

library( stringr )
str_extract( mena, "m5[0-9]" )

achieve what you are looking for?

Rgds,
Rainer
On Monday 14 November 2011 10:22:09 Petr PIKAL wrote:
#
Hi

Thank you. It is a pure magic, something taught in Unseen University.

this is what I got as a help for selecting only letters from set of 
character vector.
[1] "61A"     "62C/27"  "65A/27"  "66C/29"  "69A/29"  "70C/31"
"73A/31" 
 [8] "74C/33"  "77A/33"  "81A/35"  "82C/37"  "85A/37"  "86C/39"
"89A/39" 
[15] "90C/41"  "93A/41"  "94C/43"  "97A/43"  "98C/45"  "101A/45"
"102C/47"
[22] "105A/47" "106C/49" "109A/49" "110C/51" "113A/51"
[1] "A" "C" "A" "C" "A" "C" "A" "C" "A" "A" "C" "A" "C" "A" "C" "A" "C"
[18] "A" "C" "A" "C" "A" "C" "A" "C" "A"

Therefore I expected that

sub("m5.", "\\1", mena) or sub("m5.", "", mena)

selects what I wanted. But it was not the case.

Please can you correct me when I try to evaluate your solution?

gsub(".*_(m5.).*", "\\1", mena)

or

gsub(".*(m5.).*", "\\1", mena)

.* matches any characters
() negation? or matching selection for back reference?

Finally the expressin matches whole string and evaluates what is matched 
by parenthesised value. This evaluation is returned by backreference.

Is it correct evaluation?

Regards
Petr
my
http://www.R-project.org/posting-guide.html
#
On 14.11.2011 11:27, Petr PIKAL wrote:
Yes.
The latter. See books about ergular expressions. I think it is also 
mentioned in ?regexp and with an example in ?gsub
Indeed, where \\1 is the first backreference.

Best,
Uwe