Dynamic clustering?
Hello,
Ralf B wrote:
Are there R packages that allow for dynamic clustering, i.e. where the number of clusters are not predefined? I have a list of numbers that falls in either 2 or just 1 cluster. Here an example of one that should be clustered into two clusters: two <- c(1,2,3,2,3,1,2,3,400,300,400) and here one that only contains one cluster and would therefore not need to be clustered at all. one <- c(400,402,405, 401,410,415, 407,412) Given a sufficiently large amount of data, a statistical test or an effect size should be able to determined if a data set makes sense to be divided i.e. if there are two groups that differ well enough. I am not familiar with the underlying techniques in kmeans, but I know that it blindly divides both data sets based on the predefined number of clusters. Are there any more sophisticated methods that allow me to determine the number of clusters in a data set based on statistical tests or effect sizes ?
Caveat: I have very little experience with clustering methods, but maybe this could get you started: http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set If you only want to make 2 clusters when the means of the data are an order of magnitude apart or more, that's easy enough to do without a statistical test. For your examples above, I naively tried some functions in the mclust package, which I've never used before: mclustModel(one, (mclustBIC(one, G=1:2)))$G # gives 1 mclustModel(two, (mclustBIC(two, G=1:2)))$G # gives 2 You'll have to decide for yourself to determine if this is appropriate for your data...or if I'm even using these functions correctly. :)