An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120229/9c2d4928/attachment.pl>
regular expression
4 messages · Fred G, Gabor Grothendieck, David Winsemius +1 more
On Wed, Feb 29, 2012 at 2:24 PM, Fred G <bayespokerguy at gmail.com> wrote:
Computer Friends, with the following example lines: [107] "98-610: Cell type: S; Surv(months): 6; STATUS(0=alive, 1=dead): 1" [108] "99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1" i want to be able to isolate the number of months of survival for each row. is there a regular expression that can find the first instance of a ";", delete everything in front of it-- and find the second instance of an ";" and delete everything behind it? in python there is a function line.find(), would be grateful to hear the R equiv; or, any other better alternatives to get the number of months of survival stored as a variable.
This extracts all the numeric fields:
# sample data
Lines <- c("98-610: Cell type: S; Surv(months): 6; STATUS(0=alive, 1=dead): 1",
"99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1")
library(gsubfn)
strapply(Lines, "(\\d+);", as.numeric, simplify = TRUE)
# We can also get all numeric fields in case that is of interest:
strapply(Lines, "\\d+", as.numeric, simplify = rbind)
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Feb 29, 2012, at 2:24 PM, Fred G wrote:
Computer Friends, with the following example lines:
Modified to be correct R code. Please emulate my example in the future. inp <-c( "98-610: Cell type: S; Surv(months): 6; STATUS(0=alive, 1=dead): 1", "99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1")
i want to be able to isolate the number of months of survival for each row. is there a regular expression that can find the first instance of a ";", delete everything in front of it-- and find the second instance of an ";" and delete everything behind it? in python there is a function line.find(), would be grateful to hear the R equiv; or, any other better alternatives to get the number of months of survival stored as a variable.
You can use either regex methods (noting that the "?" is necessary to
defeat the default greedy nature of regex match.
> sub( ";.+$", "", sub("^.+?;", "", inp) )
[1] " Surv(months): 6" " Surv(months): 21"
... or you can read these as lines and pass the results to read.table
with sep =";".
> read.table(text=inp, sep=";", stringsAsFactors=FALSE)[ ,2]
[1] " Surv(months): 6" " Surv(months): 21"
[[alternative HTML version deleted]]
Please learn to post in palin text.
David Winsemius, MD West Hartford, CT
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120229/d913614e/attachment.pl>