Dear all--
I am still forging my first arms with R and I am fighting with regexpr() as
well as portability between unix and windoz. I need to extract barcodes from
filenames (which are located between a double and single underscore) as well
as the directory where the filename is residing. Here is the solution I came
to:
aFileName <-
"/Users/marco/Desktop/diagnosticAnalysis/test/MA__251329410021_S01_A01.txt"
t <- regexpr("__\\d*_",aFileName, perl=T)
t.dir <- regexpr("^.*/", aFileName, perl=T)
base.name <- substr(aFileName, t+2, t-2 + attr(t,"match.length"))
base.dir <- substr(aFileName, t.dir, attr(t.dir,"match.length"))
My questions are:
1) Is there a more elegant way to deal with regular expressions (read here:
more easier, more like perl style).
2) I have a portability problem when I extract the base.dir Windoz is using
'\' instead of '/' to separate directories.
Any suggestions/comments
Many Tx
Marco Blanchette, Ph.D.
mblanche at uclink.berkeley.edu
Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204
Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
regexpr and portability issue
3 messages · Marco Blanchette, Gabor Grothendieck, Brian Ripley
Try this. The regular expression says to match
- anything
- followed by a double underscore
- followed by one or more digits
- followed by an underscore
- followed by anything.
The digits have been parenthesized so that they can be referred to in
the backreference "\\1". Also use the R function dirname
rather than regular expressions.
base.name <- sub(".*__([[:digit:]]+)_.*", "\\1", aFileName, ext = TRUE)
base.dir <- dirname(aFileName)
On 8/3/05, Marco Blanchette <mblanche at uclink.berkeley.edu> wrote:
Dear all--
I am still forging my first arms with R and I am fighting with regexpr() as
well as portability between unix and windoz. I need to extract barcodes from
filenames (which are located between a double and single underscore) as well
as the directory where the filename is residing. Here is the solution I came
to:
aFileName <-
"/Users/marco/Desktop/diagnosticAnalysis/test/MA__251329410021_S01_A01.txt"
t <- regexpr("__\\d*_",aFileName, perl=T)
t.dir <- regexpr("^.*/", aFileName, perl=T)
base.name <- substr(aFileName, t+2, t-2 + attr(t,"match.length"))
base.dir <- substr(aFileName, t.dir, attr(t.dir,"match.length"))
My questions are:
1) Is there a more elegant way to deal with regular expressions (read here:
more easier, more like perl style).
2) I have a portability problem when I extract the base.dir Windoz is using
'\' instead of '/' to separate directories.
Any suggestions/comments
Many Tx
Marco Blanchette, Ph.D.
mblanche at uclink.berkeley.edu
Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204
Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
On Tue, 2 Aug 2005, Marco Blanchette wrote:
I am still forging my first arms with R and I am fighting with regexpr() as
well as portability between unix and windoz. I need to extract barcodes from
filenames (which are located between a double and single underscore) as well
as the directory where the filename is residing. Here is the solution I came
to:
aFileName <-
"/Users/marco/Desktop/diagnosticAnalysis/test/MA__251329410021_S01_A01.txt"
t <- regexpr("__\\d*_",aFileName, perl=T)
t.dir <- regexpr("^.*/", aFileName, perl=T)
base.name <- substr(aFileName, t+2, t-2 + attr(t,"match.length"))
base.dir <- substr(aFileName, t.dir, attr(t.dir,"match.length"))
My questions are:
1) Is there a more elegant way to deal with regular expressions (read here:
more easier, more like perl style).
Yes, use sub and backreferences. An example from the R sources doing
something similar:
wfile <- sub("/chm/([^/]*)$", "", file)
thispkg <- sub(".*/([^/]*)/chm/([^/]*)$", "\\1", file)
However, R does have functions basename() and dirname() to do this!
2) I have a portability problem when I extract the base.dir Windoz is using '\' instead of '/' to separate directories.
That is misinformation: Windows (sic) accepts either / or \ (see the
rw-FAQ and the R FAQ). Use chartr("\\", "/", path) to map \ to /.
The `portability problem' appears to be of your own making -- take heart
that R itself manages to manipulate filepaths portably.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595