Skip to content
Prev 65698 / 398502 Next

Gap statistic

Dear All,

I need to calculate the optimal number of clusters for a classification based on a large number of observations (tens of thousands).
Thibshirani et al. proposed the gap statistic for this purpose. I tried the R-code developed by R. J?rnsten but R hangs with such amount of data ().
Is it available any other (optimised) code?
Any help would be appreciated, including suggestions about other alternatives for the selection of an optimal number of cluster from large datasets.

Thanks, 


N?stor Fern?ndez, PhD.

Department of Ecological Modelling
UFZ - Centre for Environmental Research
PF 500136, DE-04301, Leipzig, Germany.
Tel: +49 341-2352034
E-mail: nestor.fernandez at ufz.de