Message-ID: <4A062BA5.5090900@gmail.com>
Date: 2009-05-10T01:19:33Z
From: Jakson A. Aquino
Subject: Reading large files quickly
In-Reply-To: <gu4apd$en0$1@ger.gmane.org>
Rob Steele wrote:
> I'm finding that readLines() and read.fwf() take nearly two hours to
> work through a 3.5 GB file, even when reading in large (100 MB) chunks.
> The unix command wc by contrast processes the same file in three
> minutes. Is there a faster way to read files in R?
I use statist to convert the fixed width data file into a csv file
because read.table() is considerably faster than read.fwf(). For example:
system("statist --na-string NA --xcols collist big.txt big.csv")
bigdf <- read.table(file = "big.csv", header=T, as.is=T)
The file collist is a text file whose lines contain the following
information:
variable begin end
where "variable" is the column name, and "begin" and "end" are integer
numbers indicating where in big.txt the columns begin and end.
Statist can be downloaded from: http://statist.wald.intevation.org/
--
Jakson Aquino
Social Sciences Department
Federal University of Cear?, Brazil