Skip to content
Prev 342942 / 398506 Next

big data?

correcting a typo (400 MB, not GB.  Thanks to David Winsemius for 
reporting it).  Spencer


###############


       Thanks to all who replied.  For the record, I will summarize here 
what I tried and what I learned:


       Mike Harwood suggested the ff package.  David Winsemius suggested 
data.table and colbycol.  Peter Langfelder suggested sqldf.


       sqldf::read.csv.sql allowed me to create an SQL command to read a 
column or a subset of the rows of a 400 MB tab-delimited file in roughly 
a minute on a 2.3 GHz dual core machine running Windows 7 with 8 GB RAM. 
  It also read a column of a 1.3 GB file in 4 minutes.  The 
documentation was sufficient to allow me to easily get what I wanted 
with a minimum of effort.


       If I needed to work with these data regularly, I might experiment 
with colbycol and ff:  The documentation suggested to me that these 
packages might allow me to get quicker answers from routine tasks after 
some preprocessing.  Of course, I could also do the preprocessing 
manually with sqldf.


       Thanks, again.
       Spencer
On 8/6/2014 9:39 AM, Mike Harwood wrote: