Skip to content
Prev 31121 / 398513 Next

estimating number of clusters ("Null or more")

Hi,

there are at least two methods to estimate the number of clusters in R:
In library(cluster), you can use the information coming with the 
silhouette plot. This is a bit difficult to figure out from the help pages
(it got better in the recent version, I think), and you can find it out
reading help pages of pam, pam.object and partition.object.

EMclust of library mclust decides about an optimal number of mixture
components using the BIC.

As far as I know, there is no direct answer to the problem of testing
homogeneity vs. clustering in R. There are lots of theoretical difficulties
and there is no "standard routine" to do this, neither in R, nor
elsewhere. I would suggest to invent a null model for your data modelled as
homogeneous and to estimate the distribution of a suitable clustering
statistics (such as the silhouette avg.width in pam, BIC, average
distance of the points to kth nearest neighbor or ratio between 25% largest
and smallest distances in the dataset) by Monte
Carlo/parametric bootstrap. Perhaps I say this too quickly; it's
non-trivial and at least you have to design the simulation so that
rejection/acceptance is not a consequence of different scaling of data and
null model. 

Hope that helps,
Christian
On Thu, 24 Apr 2003, Khamenia, Valery wrote: