Back to formatted view
Raw Message

Message-ID: <E4275319-FE4E-44AC-9695-980D0090C5BA@r-project.org>
Date: 2006-01-06T01:38:31Z
From: Simon Urbanek
Subject: Q: R 2.2.1:  Memory Management Issues?
In-Reply-To: <3CB6B0EE6816B142859B24E77724A8B8F87DDE@sccsmxsusr05.pharma.aventis.com>

On Jan 5, 2006, at 7:33 PM, <Karen.Green at sanofi-aventis.com>  
<Karen.Green at sanofi-aventis.com> wrote:

> The empirically derived limit on my machine (under R 1.9.1) was  
> approximately 7500 data points.
> I have been able to successfully run the script that uses package  
> MCLUST on several hundred smaller data sets.
>
> I even had written a work-around for the case of greater than 9600  
> data points.  My work-around first orders the
> points by their value then takes a sample (e.g. every other point  
> or 1 point every n points) in order to bring the number under  
> 9600.  No problems with the computations were observed, but you are  
> correct that a deconvolution on that larger dataset of 9600 takes  
> almost 30 minutes.  However, for our purposes, we do not have many  
> datasets over 9600 so the time is not a major constraint.
>
> Unfortunately, my management does not like using a work-around and  
> really wants to operate on the larger data sets.
> I was told to find a way to make it operate on the larger data sets  
> or avoid using R and find another solution.

Well, sure, if your only concern is the memory then moving to unix  
will give you several hundred more data points you can use. I would  
recommend a  64-bit unix preferably, because then there is  
practically no software limit on the size of virtual memory.  
Nevertheless there is still a limit of ca. 4GB for a single vector,  
so that should give you around 32500 rows that mclust can handle as- 
is (I don't want to see the runtime, though ;)). For anything else  
you'll really have to think about another approach..

Cheers,
Simon