Here is a slight variation:
read.table(textConnection(grep("<aa?[xy]>", input, value = TRUE)),
+ ? ?colClasses = c("NULL", "NULL", "numeric"))
? ? ? ? ?V3 ? ? ? ? V6
1 0.00137700 3.4644e-07
2 0.00019412 4.8840e-08
3 0.00137700 3.4644e-07
4 0.00019412 4.8840e-08
On Wed, Nov 18, 2009 at 1:54 PM, baptiste auguie
<baptiste.auguie at googlemail.com> wrote:
Hi,
Thanks for the alternative approach. However, I should have made my
example more complete in that other lines may also have numeric
values, which I'm not interested in. Below is an updated problem, with
my current solution,
tc <- textConnection(
"some text
?<ax> = ? ?1.3770E-03 ? ? <bx> = ? ?3.4644E-07
?<ay> = ? ?1.9412E-04 ? ? <by> = ? ?4.8840E-08
other text
?<aax> ?= ? ?1.3770E-03 ? ? <bbx> = ? ?3.4644E-07
?<aay> ?= ? ?1.9412E-04 ? ? <bby> = ? ?4.8840E-08
lots of other material, ?including numeric values
?1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5
?12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5
etc...")
input <-
readLines(tc)
close(tc)
## I want to retrieve the values for
## <ax>, <ay>, <aax> and <aay> only
results <- c(
strapply(input, "<ax> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
simplify = rbind),
strapply(input, "<ay> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
simplify = rbind),
strapply(input, "<aax> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
simplify = rbind),
strapply(input, "<aay> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
simplify = rbind))
results
Using the suggested base R solution, I've come up with this variation,
z <- `, grep("<ax>|<ay>|<aax>|<aay>", input,
value=TRUE))
test <- scan(textConnection(z),what=0)
test[seq(1, length(test), by=2)]
Thanks again,
baptiste
2009/11/18 Bert Gunter <gunter.berton at gene.com>:
The previous elegant solutions required the use of the gsubfn package.
Nothing wrong with that, of course, but I'm always curious whether still
relatively simple base R solutions can be found, as they are often (but not
always!) much faster. And anyway, it seems to be in the spirit of your query
to try such a solution. So here is one base R approach that I believe works.
I'll break it up into 2 lines so you can see what's going on.
## Using your example...
## First replace everything but the number with spaces
z <- gsub("[^[:digit:]E.+-]"," ",input)
z
[1] " ? ? ? ? "
[2] " ? ? ? ? ? ?1.3770E-03 ? ? ? ? ? ? ? 3.4644E-07"
[3] " ? ? ? ? ? ?1.9412E-04 ? ? ? ? ? ? ? 4.8840E-08"
[4] ""
[5] " ? ? ? ? ?"
[6] " ? ? ? ? ? ? ?1.3770E-03 ? ? ? ? ? ? ? ?3.4644E-07"
[7] " ? ? ? ? ? ? ?1.9412E-04 ? ? ? ? ? ? ? ?4.8840E-08"
## Now it can be scanned to a numeric via
z<-scan(textConnection(z),what=0)
[1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
1.9412e-04 4.8840e-08
########
I believe this strategy is reasonably general, but I haven't checked it
carefully and would appreciate folks pointing out where it trips up (e.g.
perhaps with NA's).
Best,
Bert Gunter
Genentech Nonclinical Biostatistics
?-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of baptiste auguie
Sent: Wednesday, November 18, 2009 3:57 AM
To: r-help
Subject: [R] parsing numeric values
Dear list,
I'm seeking advice to extract some numeric values from a log file
created by an external program. Consider the following example,
input <-
readLines(textConnection(
"some text
?<ax> = ? ?1.3770E-03 ? ? <bx> = ? ?3.4644E-07
?<ay> = ? ?1.9412E-04 ? ? <by> = ? ?4.8840E-08
other text
?<aax> ?= ? ?1.3770E-03 ? ? <bbx> = ? ?3.4644E-07
?<aay> ?= ? ?1.9412E-04 ? ? <bby> = ? ?4.8840E-08"))
## this is what I want
results <- c(as.numeric(strsplit(grep("<ax>", input,val=T), " ")[[1]][8]),
? ? ? ? ? ? as.numeric(strsplit(grep("<ay>", input,val=T), " ")[[1]][8]),
? ? ? ? ? ? as.numeric(strsplit(grep("<aax>", input,val=T), " ")[[1]][9]),
? ? ? ? ? ? as.numeric(strsplit(grep("<aay>", input,val=T), " ")[[1]][9])
? ? ? ? ? ? )
## [1] 0.00137700 0.00019412 0.00137700 0.00019412
The use of strsplit is not ideal here as there is a different number
of space characters in the lines containing <ax> and <aax> for
instance (hence the indices 8 and 9 respectively).
I tried to use gsubfn for a cleaner construct,
strapply(input, "<ax> += +([0-9.]+)", c, simplify=rbind,combine=as.numeric)
but I can't seem to find the correct regular expression to deal with
the exponent.
Any tips are welcome!
Best regards,
baptiste