Reading and coalescing many datafiles.

In my experience, using 'do.call("rbind", ...)' after storing all the 
data files in a list is much better than 'rbind'-ing on the fly.

-roger
Greetings.

I've got some analysis problems I'm trying to solve, the raw data for which
are accumulated in a bunch of time-and-date-based files.

/some/path/2005-01-02-00-00-02

etc.

The best 'read all these files' method I've seen in the r-help archives comes
down to 

for (df in my_list_of_filenames )
    {
          dat <- rbind(dat,my_read_function(df))
    } 

which, unpleasantly, is O(N^2) w.r.t. the number of files.

I'm fiddling with other idioms to accomplish the same goal.  Best I've come up
with so far, after extensive reference to the mailing list archives, is

my_read_function.many<-function(filenames)
  {
    filenames <- filenames[file.exists(filenames)];
    rv <- do.call("rbind", lapply(filenames,my_read_function))
    row.names(rv) = c(1:length(row.names(rv)))
    rv
  }

I'd love to have some stupid omission pointed out.

- Allen S. Rout

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Roger D. Peng
http://www.biostat.jhsph.edu/~rpeng/

Reading and coalescing many datafiles.

Thread (3 messages)