Skip to content
Back to formatted view

Raw Message

Message-ID: <4A062BA5.5090900@gmail.com>
Date: 2009-05-10T01:19:33Z
From: Jakson A. Aquino
Subject: Reading large files quickly
In-Reply-To: <gu4apd$en0$1@ger.gmane.org>

Rob Steele wrote:
> I'm finding that readLines() and read.fwf() take nearly two hours to
> work through a 3.5 GB file, even when reading in large (100 MB) chunks.
>  The unix command wc by contrast processes the same file in three
> minutes.  Is there a faster way to read files in R?

I use statist to convert the fixed width data file into a csv file
because read.table() is considerably faster than read.fwf(). For example:

system("statist --na-string NA --xcols collist big.txt big.csv")
bigdf <- read.table(file = "big.csv", header=T, as.is=T)

The file collist is a text file whose lines contain the following
information:

variable begin end

where "variable" is the column name, and "begin" and "end" are integer
numbers indicating where in big.txt the columns begin and end.

Statist can be downloaded from: http://statist.wald.intevation.org/

-- 
Jakson Aquino
Social Sciences Department
Federal University of Cear?, Brazil