Skip to content
Back to formatted view

Raw Message

Message-ID: <gu9ho5$njn$1@ger.gmane.org>
Date: 2009-05-11T15:54:44Z
From: Rob Steele
Subject: Reading large files quickly; resolved
In-Reply-To: <gu4apd$en0$1@ger.gmane.org>

Rob Steele wrote:
> I'm finding that readLines() and read.fwf() take nearly two hours to
> work through a 3.5 GB file, even when reading in large (100 MB) chunks.
>  The unix command wc by contrast processes the same file in three
> minutes.  Is there a faster way to read files in R?
> 
> Thanks!
> 

readChar() is fast.  I use strsplit(..., fixed = TRUE) to separate the
input data into lines and then use substr() to separate the lines into
fields.  I do a little light processing and write the result back out
with writeChar().  The whole thing takes thirty minutes where read.fwf()
took nearly two hours just to read the data.

Thanks for the help!