big data?
The read.table.ffdf function in the ff package can read in delimited files and store them to disk as individual columns. The ffbase package provides additional data management and analytic functionality. I have used these packages on 15 Gb files of 18 million rows and 250 columns.
On Tuesday, August 5, 2014 1:39:03 PM UTC-5, David Winsemius wrote:
On Aug 5, 2014, at 10:20 AM, Spencer Graves wrote:
What tools do you like for working with tab delimited text files up
to 1.5 GB (under Windows 7 with 8 GB RAM)? ?data.table::fread
Standard tools for smaller data sometimes grab all the available
RAM, after which CPU usage drops to 3% ;-)
The "bigmemory" project won the 2010 John Chambers Award but "is
not available (for R version 3.1.0)".
findFn("big data", 999) downloaded 961 links in 437 packages. That
contains tools for data PostgreSQL and other formats, but I couldn't find anything for large tab delimited text files.
Absent a better idea, I plan to write a function getField to
extract a specific field from the data, then use that to split the data
into 4 smaller files, which I think should be small enough that I can do
what I want.
There is the colbycol package with which I have no experience, but I
understand it is designed to partition data into column sized objects.
#--- from its help file-----
cbc.get.col {colbycol} R Documentation
Reads a single column from the original file into memory
Description
Function cbc.read.table reads a file, stores it column by column in disk
file and creates a colbycol object. Functioncbc.get.col queries this object
and returns a single column.
Thanks,
Spencer
______________________________________________ R-h... at r-project.org <javascript:> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-h... at r-project.org <javascript:> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.