Von: Gabor Grothendieck <ggrothendieck at gmail.com>
Betreff: Re: [R] help with regexp
An: "Jannis" <bt_jannis at yahoo.de>
CC: r-help at stat.math.ethz.ch
Datum: Mittwoch, 5. Oktober, 2011 15:13 Uhr
On Wed, Oct 5, 2011 at 7:56 AM,
Jannis <bt_jannis at yahoo.de>
wrote:
Dear list memebers,
I am stuck with using regular expressions.
Imagine I have a vector of character strings like:
test <- c('filename_1_def.pdf',
How could I use regexpressions to extract only the
'def'/'abc' parts of these strings?
Some try from my side yielded no results:
testresults <-
grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl =
TRUE, value = TRUE)
Somehow I seem to miss some important concept here.
Until now I always used nested sub expressions like:
testresults <- sub('.pdf$', '',
sub('^filename_[[:digit:]]_', '' , test))
but this tends to become cumbersome and I was
wondering whether there is a more elegant way to do this?
Here are a couple of solutions:
# remove everything up to _b as well as everything from .
onwards
gsub(".*_|[.].*", "", test)
# extract everything that is not a _ provided it is
immediately followed by .
library(gsubfn)
strapply(test, "([^_]+)[.]", simplify = TRUE)
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com