Skip to content

Help in kmeans

4 messages · Raji, Christian Hennig, raji sankaran

#
Hi All,

  I was using the following command for performing kmeans for Iris dataset.

Kmeans_model<-kmeans(dataFrame[,c(1,2,3,4)],centers=3)

This was giving proper results for me. But, in my application we generate
the R commands dynamically and there was a requirement that the column names
will be sent instead of column indices to the R commands.Hence, to
incorporate this, i tried using the R commands in the following way.

kmeans_model<-kmeans((SepalLength+SepalWidth+PetalLength+PetalWidth),centers=3)

or

kmeans_model<-kmeans(as.matrix(SepalLength,SepalWidth,PetalLength,PetalWidth),centers=3)

In both the ways, we found that the results are different from what we saw
with the first command (with column indices).

can you please let  us know what is going wrong here.If so, can you please
let us know how the column names can be used in kmeans to obtain the correct
results?

Many thanks,
Raji 

--
View this message in context: http://r.789695.n4.nabble.com/Help-in-kmeans-tp3430433p3430433.html
Sent from the R help mailing list archive at Nabble.com.
#
I'm not going to comment on column names, but this is just to make you 
aware that the results of k-means depend on random initialisation.

This means that it is possible that you get different results if you run 
it several times. It basically gives you a local optimum and there may be 
more than one of these.
Use set.seed to see whether this explains your problem.

Best regards,
Christian
On Wed, 6 Apr 2011, Raji wrote:

            
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche