The previous elegant solutions required the use of the gsubfn package.
Nothing wrong with that, of course, but I'm always curious whether still
relatively simple base R solutions can be found, as they are often (but not
always!) much faster. And anyway, it seems to be in the spirit of your query
to try such a solution. So here is one base R approach that I believe works.
I'll break it up into 2 lines so you can see what's going on.
## Using your example...
## First replace everything but the number with spaces
z <- gsub("[^[:digit:]E.+-]"," ",input)
z
[1] " ? ? ? ? "
[2] " ? ? ? ? ? ?1.3770E-03 ? ? ? ? ? ? ? 3.4644E-07"
[3] " ? ? ? ? ? ?1.9412E-04 ? ? ? ? ? ? ? 4.8840E-08"
[4] ""
[5] " ? ? ? ? ?"
[6] " ? ? ? ? ? ? ?1.3770E-03 ? ? ? ? ? ? ? ?3.4644E-07"
[7] " ? ? ? ? ? ? ?1.9412E-04 ? ? ? ? ? ? ? ?4.8840E-08"
## Now it can be scanned to a numeric via
z<-scan(textConnection(z),what=0)
[1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
1.9412e-04 4.8840e-08
########
I believe this strategy is reasonably general, but I haven't checked it
carefully and would appreciate folks pointing out where it trips up (e.g.
perhaps with NA's).
Best,
Bert Gunter
Genentech Nonclinical Biostatistics
?-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of baptiste auguie
Sent: Wednesday, November 18, 2009 3:57 AM
To: r-help
Subject: [R] parsing numeric values
Dear list,
I'm seeking advice to extract some numeric values from a log file
created by an external program. Consider the following example,
input <-
readLines(textConnection(
"some text
?<ax> = ? ?1.3770E-03 ? ? <bx> = ? ?3.4644E-07
?<ay> = ? ?1.9412E-04 ? ? <by> = ? ?4.8840E-08
other text
?<aax> ?= ? ?1.3770E-03 ? ? <bbx> = ? ?3.4644E-07
?<aay> ?= ? ?1.9412E-04 ? ? <bby> = ? ?4.8840E-08"))
## this is what I want
results <- c(as.numeric(strsplit(grep("<ax>", input,val=T), " ")[[1]][8]),
? ? ? ? ? ? as.numeric(strsplit(grep("<ay>", input,val=T), " ")[[1]][8]),
? ? ? ? ? ? as.numeric(strsplit(grep("<aax>", input,val=T), " ")[[1]][9]),
? ? ? ? ? ? as.numeric(strsplit(grep("<aay>", input,val=T), " ")[[1]][9])
? ? ? ? ? ? )
## [1] 0.00137700 0.00019412 0.00137700 0.00019412
The use of strsplit is not ideal here as there is a different number
of space characters in the lines containing <ax> and <aax> for
instance (hence the indices 8 and 9 respectively).
I tried to use gsubfn for a cleaner construct,
strapply(input, "<ax> += +([0-9.]+)", c, simplify=rbind,combine=as.numeric)
but I can't seem to find the correct regular expression to deal with
the exponent.
Any tips are welcome!
Best regards,
baptiste