Reading large files quickly
Rob Steele wrote:
I'm finding that readLines() and read.fwf() take nearly two hours to work through a 3.5 GB file, even when reading in large (100 MB) chunks. The unix command wc by contrast processes the same file in three minutes. Is there a faster way to read files in R?
I use statist to convert the fixed width data file into a csv file
because read.table() is considerably faster than read.fwf(). For example:
system("statist --na-string NA --xcols collist big.txt big.csv")
bigdf <- read.table(file = "big.csv", header=T, as.is=T)
The file collist is a text file whose lines contain the following
information:
variable begin end
where "variable" is the column name, and "begin" and "end" are integer
numbers indicating where in big.txt the columns begin and end.
Statist can be downloaded from: http://statist.wald.intevation.org/
Jakson Aquino Social Sciences Department Federal University of Cear?, Brazil