Skip to content

error in kmeans

5 messages · Asha Jayanthi, Ingmar Visser, Uwe Ligges +2 more

#
I am trying to generate kmean of 10 clusters for a 165 x 165 matrix.

i do not see any errors known to me. But I get this error on running the 
script

Error: empty cluster: try a better set of initial centers

the commands are

M <-matrix(scan("R_mutual",n = 165 * 165),165,165,byrow = T)

cl <- kmeans(M,centers=10,20)
len = dim(M)[1]
....
....

I ran the same script last night and it was working prefectly. I have not 
made any changes at all !!!And this is very strange. This evening when I ran 
the same script i am getting this error. My matrix file is also untouched.

Can any one let me know how to go about this. I must generate 10-mean 
clusters
Is there anyother way of doing it ? and how to avoid such error in future?

Asha


http://www.cross-tab.com/surveys/run/test.asp?sid=2026&respid=1 Help us help 
you better!
#
Hi Asha,

kmeans is a non-deterministic routine.

The help page says the following about the centers argument:

 centers: Either the number of clusters or a set of initial cluster
          centers. If the first, a random set of rows in 'x' are chosen
          as the initial centers.

Hence, different choices may lead to different results, ans as you can see
empty clusters. See ?try for a possible workaround if you want to keep using
kmeans.

Best, Ingmar
On 3/31/05 11:08 PM, "Asha Jayanthi" <ashajayanthi at hotmail.com> wrote:

            

  
    
#
Asha Jayanthi wrote:

            
Please read the docs! The help page tells you:

"centers: Either the number of clusters or a set of initial cluster 
centers. If the first, a random set of rows in x are chosen as the 
initial centers."

So, the rows are *randomly* choosen. If this does not work, why don't 
you specify a fixed set of, e.g., 10 rows?

Uwe Ligges
#
On Fri, 1 Apr 2005, Asha Jayanthi wrote:

            
It uses a random choice.  See the help page, which says so explicitly.

  
    
#
Asha Jayanthi wrote:
Reading ?kmeans we have:

   centers: Either the number of clusters or a set of initial cluster
           centers. If the first, a random set of (distinct) rows in 'x'
           is chosen  as the initial centers.

So each time you run your analysis kmeans will select 10 random starting 
values for the cluster centers. Sometimes the selection ends up with no 
objects in a cluster, sometimes it doesn't - it is random (pseudo) after 
all. You could provide the centers yourself of course, something along 
the lines of (adapted from Venables and Ripley (1999) Modern Applied 
Statistics with Splus, 3rd Edition page 338 - not sure about 4th Ed as 
my copy is at home just now):

M <- data.frame(matrix(rnorm(5000), ncol = 25))
M.x <- as.matrix(M)
h <- hclust(dist(M.x), method = "average")
initial <- tapply(M.x, list(rep(cutree(h, 10),
                                 ncol(M.x)),
                             col(M.x)),
                             mean)
M.km <- kmeans(M.x, initial)

HTH