Q: R 2.2.1: Memory Management Issues?
On Jan 5, 2006, at 7:33 PM, <Karen.Green at sanofi-aventis.com>
<Karen.Green at sanofi-aventis.com> wrote:
The empirically derived limit on my machine (under R 1.9.1) was approximately 7500 data points. I have been able to successfully run the script that uses package MCLUST on several hundred smaller data sets. I even had written a work-around for the case of greater than 9600 data points. My work-around first orders the points by their value then takes a sample (e.g. every other point or 1 point every n points) in order to bring the number under 9600. No problems with the computations were observed, but you are correct that a deconvolution on that larger dataset of 9600 takes almost 30 minutes. However, for our purposes, we do not have many datasets over 9600 so the time is not a major constraint. Unfortunately, my management does not like using a work-around and really wants to operate on the larger data sets. I was told to find a way to make it operate on the larger data sets or avoid using R and find another solution.
Well, sure, if your only concern is the memory then moving to unix will give you several hundred more data points you can use. I would recommend a 64-bit unix preferably, because then there is practically no software limit on the size of virtual memory. Nevertheless there is still a limit of ca. 4GB for a single vector, so that should give you around 32500 rows that mclust can handle as- is (I don't want to see the runtime, though ;)). For anything else you'll really have to think about another approach.. Cheers, Simon