Q: R 2.2.1: Memory Management Issues? - R-devel

Thu, Jan 5, 2006 2:18 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-devel/attachments/20060105/e3eab45e/attachment.pl

Simon Urbanek

Thu, Jan 5, 2006 4:12 PM #

Karen,

On Jan 5, 2006, at 5:18 PM, <Karen.Green at sanofi-aventis.com>

<Karen.Green at sanofi-aventis.com> wrote:

This is 1.1GB of RAM to allocate alone for one vector(!). As you  
stated yourself the total upper limit is 2GB, so you cannot even fit  
two of those in memory anyway - not much you can do with it even if  
it is allocated.

I suspect that memory is your least problem. Did you even try to run  
EMclust on a small subsample? I suspect that if you did, you would  
figure out that what you are trying to do is not likely to terminate  
within days...

Because that is not the only 1GB vector that is allocated. Your "15GB/ 
defragmented" are irrelevant - if at all, look how much virtual  
memory is set up in you system's preferences.

Well, a toy example of 17000x2 needs 2.3GB and it's unlikely to  
terminate anytime soon, so I'd rather call it shooting with the wrong  
gun. Maybe you should consider different approach to your problem -  
possibly ask at the BioConductor list, because people there have more  
experience with large data and this is not really a technical  
question about R, but rather how to apply statistical methods.

Any reasonable unix will do - technically (64-bit versions  
preferably, but in your case even 32-bit would do). Again, I don't  
think memory is your only problem here, though.

Cheers,
Simon

Brian Ripley

Fri, Jan 6, 2006 12:44 AM #

On Thu, 5 Jan 2006, Simon Urbanek wrote:

Just in case people missed this (Simon as a MacOS user has no reason to 
know this), the Windows limit is in fact 3Gb if you tell your OS to allow 
it.  (How is in the quoted rw-FAQ, Q2.9, and from 2.2.1 R will 
automatically notice this whereas earlier versions needed to be told.)

However, there is another problem with a 32-bit OS:  you can only fit 2 
1.1Gb objects in a 3Gb address space if they are in specific positions, 
and fragmentation is often a big problem.

I believe a 64-bit OS with 4Gb of RAM would handle such problems much 
more comfortably.  The alternative is to find (or write) more efficient 
mixture-fitting software than mclust.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595