Skip to content
Prev 260870 / 398502 Next

Processing large datasets

Hi,
On Wed, May 25, 2011 at 10:18 AM, Roman Naumenko <roman at bestroman.com> wrote:
[snip]
Yeah, I know -- I only mentioned in the context of manipulating
data.frame-like objects -- sorry if I wasn't clear.

If you've got data that's data.frame like that you can store in ram
AND you find yourself wanting to do some summary calcs over different
subgroups of it, you might find that data.table will be a quicker way
to get that done -- the larger your data.frame/table, the more
noticeable the speed.

To give you and idea of what scenarios I'm talking about, other
packages you'd use to do the same would by plyr and sqldf.

For out of memory datasets, you're in a different realm -- hence the
HPC Task view link.
Cool.

I've had some luck using the bigmemory package (and friends) in the
past as well.

-steve