regular expression for selection
Hi Thank you. It is a pure magic, something taught in Unseen University. this is what I got as a help for selecting only letters from set of character vector.
vzor
[1] "61A" "62C/27" "65A/27" "66C/29" "69A/29" "70C/31" "73A/31" [8] "74C/33" "77A/33" "81A/35" "82C/37" "85A/37" "86C/39" "89A/39" [15] "90C/41" "93A/41" "94C/43" "97A/43" "98C/45" "101A/45" "102C/47" [22] "105A/47" "106C/49" "109A/49" "110C/51" "113A/51"
gsub("[^A-z]", "", vzor)
[1] "A" "C" "A" "C" "A" "C" "A" "C" "A" "A" "C" "A" "C" "A" "C" "A" "C"
[18] "A" "C" "A" "C" "A" "C" "A" "C" "A"
Therefore I expected that
sub("m5.", "\\1", mena) or sub("m5.", "", mena)
selects what I wanted. But it was not the case.
Please can you correct me when I try to evaluate your solution?
gsub(".*_(m5.).*", "\\1", mena)
or
gsub(".*(m5.).*", "\\1", mena)
.* matches any characters
() negation? or matching selection for back reference?
Finally the expressin matches whole string and evaluates what is matched
by parenthesised value. This evaluation is returned by backreference.
Is it correct evaluation?
Regards
Petr
On 14.11.2011 10:22, Petr PIKAL wrote:
Hi
On 11/14/2011 07:45 PM, Petr PIKAL wrote:
Dear all I am again (as usual) lost in regular expression use for selection.
Here
are my data:
dput(mena)
c("138516_10g_50ml_50c_250utes1_m53.00-_s1.imp",
"138516_10g_50ml_50c_250utes1_m54.00_s1.imp",
"138516_10g_50ml_50c_250utes1_m55.00_s1.imp",
"138516_10g_50ml_50c_250utes1_m56.00_s1.imp",
"138516_10g_50ml_50c_250utes1_m57.00_s1.imp",
"138516_10g_50ml_50c_250utes1_m58.00_s1.imp",
"138516_10g_50ml_50c_250utes1_m59.00_s1.imp")
I want to select only values "m" foolowed by numbers from 53 to 59.
I used
sub("m5.", "", mena)
which correctly selects those m53 - m59 values but, in contrary to
my
expectation, it replaced the selected values with specified
replacement -
in that case empty string. What I shall use if I want to get rid of all but m53-m59 from those strings?
Hi Petr,
How about:
grep("m5",mena)
It gives numeric values which tells me that there is a match in each string, but as a result I need only m53-m59 substrings.
gsub(".*_(m5.).*", "\\1", mena)
Uwe Ligges
Regards Petr
Jim
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.