regular expression
On Wed, Feb 29, 2012 at 2:24 PM, Fred G <bayespokerguy at gmail.com> wrote:
Computer Friends, with the following example lines: [107] "98-610: Cell type: S; Surv(months): 6; STATUS(0=alive, 1=dead): 1" [108] "99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1" i want to be able to isolate the number of months of survival for each row. is there a regular expression that can find the first instance of a ";", delete everything in front of it-- and find the second instance of an ";" and delete everything behind it? in python there is a function line.find(), would be grateful to hear the R equiv; or, any other better alternatives to get the number of months of survival stored as a variable.
This extracts all the numeric fields:
# sample data
Lines <- c("98-610: Cell type: S; Surv(months): 6; STATUS(0=alive, 1=dead): 1",
"99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1")
library(gsubfn)
strapply(Lines, "(\\d+);", as.numeric, simplify = TRUE)
# We can also get all numeric fields in case that is of interest:
strapply(Lines, "\\d+", as.numeric, simplify = rbind)
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com