Skip to content
Prev 112446 / 398498 Next

Fanny Clustering

Sergio Della Franca wrote:
Well, if you would like to increase memory of MY computer... you are 
welcome to do so... but I doubt it would be of any use for you ;-)

You don't tell us how much RAM you have currently, which platform you 
use, etc... The general approach is to use a computer with more RAM, up 
to the limit permitted by a 32-bit system for R, and then, to switch to 
a 64-bit version under Linux, if you need even more RAM.

The other proposed solution is not stupid. With 70.000 cases, you have a 
fairly large dataset. You don't tell use how many groups you expect from 
your clustering, but it is often better to use a couple of tens, or 
hundreds of representative cases for each group, no more. In supervised 
classification, it is easier to build such a training set with 
relatively balanced number of items in each group, because targeted 
classification is known a priori from the manual classification provided.

With unsupervised classification, you could either try a pure random 
subsampling, or select your subsample based on similarity according to a 
given distance measurement. I did something like that using a 
Malahanobis distance, MDS, and then, stratified subsampling inside a 
regular grid placed on top of the MDS plot.

Otherwise, I am not a specialist of unsupervised classification, and 
other people here could have better suggestion.

Best,

Philippe Grosjean