Skip to content

CLARA and determining the right number of clusters

4 messages · pacomet, Christian Hennig

#
Hi there,

generally finding the right number of clusters is a difficult problem and 
depends heavily on the cluster concept needed for the particular 
application.
No outcome of any automatic mathod should be taken for granted.

Having said that, I guess that something like the example given in
(replacing pam by clara) should work with clara, too.

Regards,
Christian
On Tue, 30 Sep 2008, pacomet wrote:

            
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
#
Hi there,
I'm not sure whether you feel that you have to look at a single variable at 
a time, but the whole thing should work for more than one as well.
This kind of problem concerns your data and your application and cannot be 
solved on such a mailing list. Perhaps you should go for professional 
advice about your particular data. Quite obviously, if you restrict the 
number of clusters to be at most 10, you cannot find eleven, and how strong 
you "think that there should not be more than 8-9 clusters" and how good 
your arguments against 11 are, nobody on this list can decide.
The general problem is that there is no unique statistical definition of 
what a "true cluster" is and whether your dataset rather contains 5 or 11 
clusters (or any other number) depends on what you want to call a 
"cluster".
I don't know which plot you refer to but you may have a look at the Kaufman 
and Rousseeuw book quoted on the help page.

Best wishes,
Christian
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche