Skip to content
Prev 66872 / 398506 Next

error in kmeans

Asha Jayanthi wrote:
Reading ?kmeans we have:

   centers: Either the number of clusters or a set of initial cluster
           centers. If the first, a random set of (distinct) rows in 'x'
           is chosen  as the initial centers.

So each time you run your analysis kmeans will select 10 random starting 
values for the cluster centers. Sometimes the selection ends up with no 
objects in a cluster, sometimes it doesn't - it is random (pseudo) after 
all. You could provide the centers yourself of course, something along 
the lines of (adapted from Venables and Ripley (1999) Modern Applied 
Statistics with Splus, 3rd Edition page 338 - not sure about 4th Ed as 
my copy is at home just now):

M <- data.frame(matrix(rnorm(5000), ncol = 25))
M.x <- as.matrix(M)
h <- hclust(dist(M.x), method = "average")
initial <- tapply(M.x, list(rep(cutree(h, 10),
                                 ncol(M.x)),
                             col(M.x)),
                             mean)
M.km <- kmeans(M.x, initial)

HTH