Skip to content

Using huge datasets

1 message · Liaw, Andy

#
A matrix of that size takes up just over 320MB to store in memory.  I'd
imagine you probably can do it with 2GB physical RAM (assuming your
`columns' are all numeric variables; i.e., no factors).

However, perhaps better way than the brute-force, one-shot way, is to read
in the data in chunks and do the prediction piece by piece.  You can use
scan(), or open()/readLines()/close() to do this fairly easily.

My understanding of how (most) clusters work is that you need at least one
node that will accommodate the memory load for the monolithic R process, so
probably not much help.  (I could very well be wrong about this.  If so, I'd
be very grateful for correction.)

HTH,
Andy
------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}