Skip to content
Prev 200700 / 398503 Next

parsing numeric values

The previous elegant solutions required the use of the gsubfn package.
Nothing wrong with that, of course, but I'm always curious whether still
relatively simple base R solutions can be found, as they are often (but not
always!) much faster. And anyway, it seems to be in the spirit of your query
to try such a solution. So here is one base R approach that I believe works.
I'll break it up into 2 lines so you can see what's going on.

## Using your example...
## First replace everything but the number with spaces
[1] "         "                                         
[2] "            1.3770E-03               3.4644E-07"   
[3] "            1.9412E-04               4.8840E-08"   
[4] ""                                                  
[5] "          "                                        
[6] "              1.3770E-03                3.4644E-07"
[7] "              1.9412E-04                4.8840E-08"

## Now it can be scanned to a numeric via
Read 8 items
[1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
1.9412e-04 4.8840e-08

########
I believe this strategy is reasonably general, but I haven't checked it
carefully and would appreciate folks pointing out where it trips up (e.g.
perhaps with NA's).

Best,

Bert Gunter
Genentech Nonclinical Biostatistics
 
 -----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of baptiste auguie
Sent: Wednesday, November 18, 2009 3:57 AM
To: r-help
Subject: [R] parsing numeric values

Dear list,

I'm seeking advice to extract some numeric values from a log file
created by an external program. Consider the following example,

input <-
readLines(textConnection(
"some text
  <ax> =    1.3770E-03     <bx> =    3.4644E-07
  <ay> =    1.9412E-04     <by> =    4.8840E-08

other text
  <aax>  =    1.3770E-03     <bbx> =    3.4644E-07
  <aay>  =    1.9412E-04     <bby> =    4.8840E-08"))

## this is what I want
results <- c(as.numeric(strsplit(grep("<ax>", input,val=T), " ")[[1]][8]),
             as.numeric(strsplit(grep("<ay>", input,val=T), " ")[[1]][8]),
             as.numeric(strsplit(grep("<aax>", input,val=T), " ")[[1]][9]),
             as.numeric(strsplit(grep("<aay>", input,val=T), " ")[[1]][9])
             )

## [1] 0.00137700 0.00019412 0.00137700 0.00019412

The use of strsplit is not ideal here as there is a different number
of space characters in the lines containing <ax> and <aax> for
instance (hence the indices 8 and 9 respectively).

I tried to use gsubfn for a cleaner construct,

strapply(input, "<ax> += +([0-9.]+)", c, simplify=rbind,combine=as.numeric)

but I can't seem to find the correct regular expression to deal with
the exponent.


Any tips are welcome!


Best regards,

baptiste

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.