Dear Christian, first of all thank you for your answer. I am going to parse through the pages you told me. Meanwhile I'd like to note that probably it is a good idea to put 2-3 lines of R-code demonstrating such a simple needs somnewhere in docs of `cluster' package. E.g. x<-rnorm(500) ... # output means we have rather 1 claster x<-c(rnorm(500), rnorm(500)+5) ... # output means we have rather 2 or more claster It would be nice not only for me.
EMclust of library mclust decides about an optimal number of mixture components using the BIC.
It is not clear for me whether one could use BIC without a statement about the familiy of distribution. Indeed BIC is based on likelihood, and what the likelihood should be if the only adequate statement about the destribution is the ECDF itself?..
As far as I know, there is no direct answer to the problem of testing homogeneity vs. clustering in R. There are lots of theoretical difficultiesand there is no "standard routine" to do this, neither in R, nor elsewhere.
I am not looking for the Holy Grail, or I hope so :-) In particular, I beleive some entropy-based criteria should fully satisfy me here. BIC might be also good if it might be applied to a ECDF.
I would suggest to invent a null model for your data modelled as homogeneous and to estimate the distribution of a suitable clustering statistics (such as the silhouette avg.width in pam, BIC, average distance of the points to kth nearest neighbor or ratio between 25% largest and smallest distances in the dataset) by Monte Carlo/parametric bootstrap. Perhaps I say this too quickly;
a bit compressed, but something is clear anyway :-)
it's non-trivial and at least you have to design the simulation so that rejection/acceptance is not a consequence of different scaling of data and null model.
not clear here :-) thanks again Valery A.Khamenya