Skip to content

help with regexp

5 messages · Jannis, Albert-Jan Roskam, Eik Vettorazzi +1 more

#
Dear list memebers, 


I am stuck with using regular expressions.


Imagine I have a vector of character strings like:

test <- c('filename_1_def.pdf', 'filename_2_abc.pdf')

How could I use regexpressions to extract only the 'def'/'abc' parts of these strings?


Some try from my side yielded no results:

testresults <- grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, value = TRUE)

Somehow I seem to miss some important concept here. Until now I always used nested sub expressions like:

testresults <- sub('.pdf$', '', sub('^filename_[[:digit:]]_', '' , test))


but this tends to become cumbersome and I was wondering whether there is a more elegant way to do this?



Thanks for any help

Jannis
#
Hi Jannis,
just use the backreferences in gsub, see ?gsub, -> replacement

test <- c('filename_1_def.pdf', 'filename_2_abc.pdf')
gsub(".*_([A-z]+)\\.pdf", "\\1", test)

hth.

Am 05.10.2011 13:56, schrieb Jannis:

  
    
#
On Wed, Oct 5, 2011 at 7:56 AM, Jannis <bt_jannis at yahoo.de> wrote:
Here are a couple of solutions:

# remove everything up to _b as well as everything from . onwards
gsub(".*_|[.].*", "", test)

# extract everything that is not a _ provided it is immediately followed by .
library(gsubfn)
strapply(test, "([^_]+)[.]", simplify = TRUE)
1 day later
#
Thanks to all who replied! With all these possible solutions it will be hard to find the best one :-).

--- Gabor Grothendieck <ggrothendieck at gmail.com> schrieb am Mi, 5.10.2011: