Dear list memebers,
I am stuck with using regular expressions.
Imagine I have a vector of character strings like:
test <- c('filename_1_def.pdf', 'filename_2_abc.pdf')
How could I use regexpressions to extract only the 'def'/'abc' parts of these strings?
Some try from my side yielded no results:
testresults <- grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, value = TRUE)
Somehow I seem to miss some important concept here. Until now I always used nested sub expressions like:
testresults <- sub('.pdf$', '', sub('^filename_[[:digit:]]_', '' , test))
but this tends to become cumbersome and I was wondering whether there is a more elegant way to do this?
Thanks for any help
Jannis
help with regexp
5 messages · Jannis, Albert-Jan Roskam, Eik Vettorazzi +1 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111005/734ae50b/attachment.pl>
Hi Jannis,
just use the backreferences in gsub, see ?gsub, -> replacement
test <- c('filename_1_def.pdf', 'filename_2_abc.pdf')
gsub(".*_([A-z]+)\\.pdf", "\\1", test)
hth.
Am 05.10.2011 13:56, schrieb Jannis:
Dear list memebers,
I am stuck with using regular expressions.
Imagine I have a vector of character strings like:
test <- c('filename_1_def.pdf', 'filename_2_abc.pdf')
How could I use regexpressions to extract only the 'def'/'abc' parts of these strings?
Some try from my side yielded no results:
testresults <- grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, value = TRUE)
Somehow I seem to miss some important concept here. Until now I always used nested sub expressions like:
testresults <- sub('.pdf$', '', sub('^filename_[[:digit:]]_', '' , test))
but this tends to become cumbersome and I was wondering whether there is a more elegant way to do this?
Thanks for any help
Jannis
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Eik Vettorazzi Institut f?r Medizinische Biometrie und Epidemiologie Universit?tsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr. Alexander Kirstein, Joachim Pr?l?, Prof. Dr. Dr. Uwe Koch-Gromus
On Wed, Oct 5, 2011 at 7:56 AM, Jannis <bt_jannis at yahoo.de> wrote:
Dear list memebers,
I am stuck with using regular expressions.
Imagine I have a vector of character strings like:
test <- c('filename_1_def.pdf', 'filename_2_abc.pdf')
How could I use regexpressions to extract only the 'def'/'abc' parts of these strings?
Some try from my side yielded no results:
testresults <- grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, value = TRUE)
Somehow I seem to miss some important concept here. Until now I always used nested sub expressions like:
testresults <- sub('.pdf$', '', sub('^filename_[[:digit:]]_', '' , test))
but this tends to become cumbersome and I was wondering whether there is a more elegant way to do this?
Here are a couple of solutions:
# remove everything up to _b as well as everything from . onwards
gsub(".*_|[.].*", "", test)
# extract everything that is not a _ provided it is immediately followed by .
library(gsubfn)
strapply(test, "([^_]+)[.]", simplify = TRUE)
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
1 day later
Thanks to all who replied! With all these possible solutions it will be hard to find the best one :-). --- Gabor Grothendieck <ggrothendieck at gmail.com> schrieb am Mi, 5.10.2011:
Von: Gabor Grothendieck <ggrothendieck at gmail.com> Betreff: Re: [R] help with regexp An: "Jannis" <bt_jannis at yahoo.de> CC: r-help at stat.math.ethz.ch Datum: Mittwoch, 5. Oktober, 2011 15:13 Uhr On Wed, Oct 5, 2011 at 7:56 AM, Jannis <bt_jannis at yahoo.de> wrote:
Dear list memebers,
I am stuck with using regular expressions.
Imagine I have a vector of character strings like:
test <- c('filename_1_def.pdf',
'filename_2_abc.pdf')
How could I use regexpressions to extract only the
'def'/'abc' parts of these strings?
Some try from my side yielded no results: testresults <-
grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl =
TRUE, value = TRUE)
Somehow I seem to miss some important concept here.
Until now I always used nested sub expressions like:
testresults <- sub('.pdf$', '',
sub('^filename_[[:digit:]]_', '' , test))
but this tends to become cumbersome and I was
wondering whether there is a more elegant way to do this?
Here are a couple of solutions:
# remove everything up to _b as well as everything from .
onwards
gsub(".*_|[.].*", "", test)
# extract everything that is not a _ provided it is
immediately followed by .
library(gsubfn)
strapply(test, "([^_]+)[.]", simplify = TRUE)
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com