Processing large datasets

Wed, May 25, 2011 7:49 AM

Hi,

On Wed, May 25, 2011 at 10:18 AM, Roman Naumenko <roman at bestroman.com> wrote:

[snip]

Yeah, I know -- I only mentioned in the context of manipulating
data.frame-like objects -- sorry if I wasn't clear.

If you've got data that's data.frame like that you can store in ram
AND you find yourself wanting to do some summary calcs over different
subgroups of it, you might find that data.table will be a quicker way
to get that done -- the larger your data.frame/table, the more
noticeable the speed.

To give you and idea of what scenarios I'm talking about, other
packages you'd use to do the same would by plyr and sqldf.

For out of memory datasets, you're in a different realm -- hence the
HPC Task view link.

Cool.

I've had some luck using the bigmemory package (and friends) in the
past as well.

-steve

Steve Lianoglou
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

Processing large datasets

Thread (14 messages)