Skip to content
Prev 12523 / 398502 Next

Memory/data -last time I promise

On Tue, 24 Jul 2001, Micheall Taylor wrote:

            
That's quite possible.  A `14Mb dataset' is not too helpful to us.  You
seem to have one char (ca 2 chars) and 9 numeric variables per record.
That's ca 75 bytes per record.  An actual experiment and using object.size
gives 88 (there are row names too).  So at 70Mb, that is about 0.8M rows.
If that's not right, the data are not being read in correctly.

The main problem I see is that your machine seems unable to allocate more
than about 450Mb to R, and it has surprisingly little swap space.  (This
512Mb Linux machine has 1Gb of swap allocated, and happily allocates 800Mb
to R when needed.)
Probably not.  R does require objects to be stored in memory.

As a serious statistical question: what can you usefully do with 8M rows
on 9 continuous variables?  Why would a 1% sample not be already far more
than enough?  My group regularly works with datasets in the 100s of Mb,
but normally we either sample or we summarize in groups for further
analysis.  Our latest dataset is a 1.2Gb Oracle table, but it has
structure (it's 60 experiments for a start).

[...]

BTW, rbind is inefficient, but adding a piece at time is the least
efficient way to use it.  rbind(full1, full2, ..., full10) would be
better.  Allocating full and assigning to sub-sections would be better
still.