Message-ID: <d0f80125-b358-431d-b5b5-79631223d591@googlegroups.com>
Date: 2014-08-06T16:39:41Z
From: Mike Harwood
Subject: big data?
In-Reply-To: <34476B18-46F4-4035-B7E9-FC3B83D6643B@comcast.net>
The read.table.ffdf function in the ff package can read in delimited files
and store them to disk as individual columns. The ffbase package provides
additional data management and analytic functionality. I have used these
packages on 15 Gb files of 18 million rows and 250 columns.
On Tuesday, August 5, 2014 1:39:03 PM UTC-5, David Winsemius wrote:
>
>
> On Aug 5, 2014, at 10:20 AM, Spencer Graves wrote:
>
> > What tools do you like for working with tab delimited text files up
> to 1.5 GB (under Windows 7 with 8 GB RAM)?
>
> ?data.table::fread
>
> > Standard tools for smaller data sometimes grab all the available
> RAM, after which CPU usage drops to 3% ;-)
> >
> >
> > The "bigmemory" project won the 2010 John Chambers Award but "is
> not available (for R version 3.1.0)".
> >
> >
> > findFn("big data", 999) downloaded 961 links in 437 packages. That
> contains tools for data PostgreSQL and other formats, but I couldn't find
> anything for large tab delimited text files.
> >
> >
> > Absent a better idea, I plan to write a function getField to
> extract a specific field from the data, then use that to split the data
> into 4 smaller files, which I think should be small enough that I can do
> what I want.
>
> There is the colbycol package with which I have no experience, but I
> understand it is designed to partition data into column sized objects.
> #--- from its help file-----
> cbc.get.col {colbycol} R Documentation
> Reads a single column from the original file into memory
>
> Description
>
> Function cbc.read.table reads a file, stores it column by column in disk
> file and creates a colbycol object. Functioncbc.get.col queries this object
> and returns a single column.
>
> > Thanks,
> > Spencer
> >
> > ______________________________________________
> > R-h... at r-project.org <javascript:> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-h... at r-project.org <javascript:> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>