sscanf equivalent
On Fri, 7 Oct 2005, Prof Brian Ripley wrote:
On Fri, 7 Oct 2005, Paul Roebuck wrote:
I have a data file from which I need to read portions of
data but data location/quantity can change from file to file.
I wrote some code and have a working solution but it seems
wasteful to have to do it this way. Here's the contrived
incomplete code.
datalines <- readLines(datafile.pathname)
# marker will appear on line preceding and following
# actual data
offset.data <- grep("marker", datalines)
datalines <- NULL
# grab first column of each assoc dataline
data <- scan(datafile.pathname,
what = numeric(0),
skip = offset.data[1],
nlines = offset.data[2]-offset.data[1]-1,
flush = TRUE,
multi.line = FALSE,
quiet = TRUE)
# output is vector of values
Originally wrote code to parse data from 'datalines'
using sub and strsplit methods but it was woefully slower
and more complex than using scan method. What is desired
is a means of invoking method like scan but with existing
data instead of filename.
Why not use a text connection?
I tried that but result was far slower than the method above. R> file.info(datafile.pathname)$size [1] 944850 R> system.time(datalines<-readLines(datafile.pathname), TRUE)[3] [1] 0.59 R> length(datalines) [1] 67931 R> system.time(tconn<-textConnection(datalines), TRUE)[3] [1] 52.97 Once a textConnection object was created, the scan method invocation using it took less than half the time of the corresponding filename-based invocation. Problem is that this was only taking a second to perform the scan using the filename-based invocation. And since grep method doesn't accept textConnection as argument, I still require the otherwise unused 'datalines' variable and its associated memory. Even if grep supported such, the timing increased even more not having the variable. R> system.time(tconn<-textConnection(readLines(datafile.pathname)), TRUE)[3] [1] 66.61 Any other thoughts? # R version 2.1.1, 2005-06-20, powerpc-apple-darwin7.9.0 ---------------------------------------------------------- SIGSIG -- signature too long (core dumped)