Hi all,
I am trying to run the k-means cluster analysis using the function kmeans
in the package cluster.
The data are:
x = c(-0.26, -0.23, -0.05, -0.20, 0.30, -0.84, -0.10, -0.12, 0.10, -0.31,
-0.19, 0.18, -0.26,
-0.23, -0.37, -0.23)
I've got two different solutions when I ran this function over a few times:
kmeans(x, centers=2)
The first solution gives the following:
$cluster
[1] 2 2 1 2 1 2 2 2 1 2 2 1 2 2 2 2
$centers
[,1]
1 0.1325000
2 -0.2783333
$withinss
[1] 0.0646750 0.4033667
$size
[1] 4 12
The second solution gives the following:
$cluster
[1] 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1
$centers
[,1]
1 -0.1313333
2 -0.8400000
$withinss
[1] 0.5035733 0.0000000
$size
[1] 15 1
I don't understand why this is happening, and how do I choose between the
two solutions. Also, how can I ensure
consistent solution over times? Thanks a lot!
- Jacqueline
k- means cluster analysis
3 messages · Ngayee J Law, Sundar Dorai-Raj, Brian Ripley
Ngayee J Law wrote:
Hi all,
I am trying to run the k-means cluster analysis using the function kmeans
in the package cluster.
The data are:
x = c(-0.26, -0.23, -0.05, -0.20, 0.30, -0.84, -0.10, -0.12, 0.10, -0.31,
-0.19, 0.18, -0.26,
-0.23, -0.37, -0.23)
I've got two different solutions when I ran this function over a few times:
kmeans(x, centers=2)
The first solution gives the following:
$cluster
[1] 2 2 1 2 1 2 2 2 1 2 2 1 2 2 2 2
$centers
[,1]
1 0.1325000
2 -0.2783333
$withinss
[1] 0.0646750 0.4033667
$size
[1] 4 12
The second solution gives the following:
$cluster
[1] 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1
$centers
[,1]
1 -0.1313333
2 -0.8400000
$withinss
[1] 0.5035733 0.0000000
$size
[1] 15 1
I don't understand why this is happening, and how do I choose between the
two solutions. Also, how can I ensure
consistent solution over times? Thanks a lot!
- Jacqueline
From the help page for `kmeans':
centers: Either the number of clusters or a set of initial cluster
centers. If the first, a random set of rows in `x' are chosen
as the initial centers.
If you want the same results try supplying an initial center, as in:
kmeans(x, centers = c(0.1, -0.2))
However, choosing bad starting values could cause kmeans to crash, as in:
kmeans(x, centers = c(0, 0))
Regards,
Sundar
On Wed, 12 Feb 2003, Sundar Dorai-Raj wrote:
Ngayee J Law wrote:
Hi all, I am trying to run the k-means cluster analysis using the function kmeans in the package cluster.
I think it's the one in package mva. [...]
However, choosing bad starting values could cause kmeans to crash, as in: kmeans(x, centers = c(0, 0))
Really? That does not crash here, but correctly reports an error message:
kmeans(x, c(0,0))
Error in switch(Z$ifault, stop("empty cluster: try a better set of initial
centers"), :
empty cluster: try a better set of initial centers
If you actually have a crash, please report to R-bugs.
There is an increasing trend for people to describe informative error
messages describing their errors as `crashes', but that is confusing at
best.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595