Skip to content
Prev 157600 / 398506 Next

CLARA and determining the right number of clusters

Hi there,
I'm not sure whether you feel that you have to look at a single variable at 
a time, but the whole thing should work for more than one as well.
This kind of problem concerns your data and your application and cannot be 
solved on such a mailing list. Perhaps you should go for professional 
advice about your particular data. Quite obviously, if you restrict the 
number of clusters to be at most 10, you cannot find eleven, and how strong 
you "think that there should not be more than 8-9 clusters" and how good 
your arguments against 11 are, nobody on this list can decide.
The general problem is that there is no unique statistical definition of 
what a "true cluster" is and whether your dataset rather contains 5 or 11 
clusters (or any other number) depends on what you want to call a 
"cluster".
I don't know which plot you refer to but you may have a look at the Kaufman 
and Rousseeuw book quoted on the help page.

Best wishes,
Christian
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche