Reading large files quickly
At the moment I'm just reading the large file to see how fast it goes. Eventually, if I can get the read time down, I'll write out a processed version. Thanks for suggesting scan(); I'll try it. Rob
jim holtman wrote:
Since you are reading it in chunks, I assume that you are writing out each segment as you read it in. How are you writing it out to save it? Is the time you are quoting both the reading and the writing? If so, can you break down the differences in what these operations are taking? How do you plan to use the data? Is it all numeric? Are you keeping it in a dataframe? Have you considered using 'scan' to read in the data and to specify what the columns are? If you would like some more help, the answer to these questions will help. On Sat, May 9, 2009 at 10:09 PM, Rob Steele <freenx.10.robsteele at xoxy.net>wrote:
Thanks guys, good suggestions. To clarify, I'm running on a fast multi-core server with 16 GB RAM under 64 bit CentOS 5 and R 2.8.1. Paging shouldn't be an issue since I'm reading in chunks and not trying to store the whole file in memory at once. Thanks again. Rob Steele wrote:
I'm finding that readLines() and read.fwf() take nearly two hours to work through a 3.5 GB file, even when reading in large (100 MB) chunks. The unix command wc by contrast processes the same file in three minutes. Is there a faster way to read files in R? Thanks! >
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.