-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Erik Iverson
Sent: Wednesday, May 05, 2010 2:33 PM
To: Ralf B
Cc: r-help at r-project.org
Subject: Re: [R] Dynamic clustering?
Hello,
Ralf B wrote:
Are there R packages that allow for dynamic clustering, i.e. where
the
number of clusters are not predefined? I have a list of numbers that
falls in either 2 or just 1 cluster. Here an example of one that
should be clustered into two clusters:
two <- c(1,2,3,2,3,1,2,3,400,300,400)
and here one that only contains one cluster and would therefore not
need to be clustered at all.
one <- c(400,402,405, 401,410,415, 407,412)
Given a sufficiently large amount of data, a statistical test or an
effect size should be able to determined if a data set makes sense to
be divided i.e. if there are two groups that differ well enough. I am
not familiar with the underlying techniques in kmeans, but I know
that
it blindly divides both data sets based on the predefined number of
clusters. Are there any more sophisticated methods that allow me to
determine the number of clusters in a data set based on statistical
tests or effect sizes ?
<<<snip>>>
Ralf,
There is no procedure in R or any other stat package that can make these kinds of decisions without a whole lot more specification of the problem. You give two examples above. What would you want done with
c(380, 400, 402, 405, 401, 410, 415, 407, 412), or
c(350, 400, 402, 405, 401, 410, 415, 407, 412), or
c(300, 400, 402, 405, 401, 410, 415, 407, 412), or
c(100, 400, 402, 405, 401, 410, 415, 407, 412), or
...
i.e. what difference counts as big enough or variable enough or ...?
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204