kmeans clustering
On Mon, 14 Apr 2003, pingzhao wrote:
Hi, I am using kmeans to cluster a dataset. I test this example:
data<-matrix(scan("data100.txt"),100,37,byrow=T)
(my dataset is 100 rows and 37 columns--clustering rows)
> c1<-kmeans(data,3,20) c1
$cluster [1] 1 1 1 1 1 1 1 3 3 3 1 3 1 3 3 1 1 1 1 3 1 3 3 1 1 1 3 3 1 1 3 1 1 1 1 3 3 [38] 3 1 1 1 3 1 1 1 1 3 3 3 1 1 1 1 1 1 3 1 3 1 1 3 1 1 1 1 3 1 1 1 1 1 1 3 1 [75] 1 3 1 3 1 1 1 1 3 1 1 1 1 1 3 1 1 3 1 1 3 3 1 2 1 1 $withinss [1] 1037.5987 0.0000 666.9701 $size [1] 68 1 31
c4<-kmeans(data,3,20)
$withinss [1] 0.0000 865.7628 851.1214 $size [1] 1 54 45 Does any one tell me why the results are very different with the same dataset and parameters when I run some times this command 'kmeans(data,3,20)'???
The help page could tell you:
centers: Either the number of clusters or a set of initial cluster
centers. If the first, a random set of rows in `x' are chosen
as the initial centers.
At the very least, the labellings of the clusters are arbitrary, but
K-means usually has multiple local minima.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595