Skip to content

Help with 2-D plot of k-mean clustering analysis

5 messages · Meng Wu, David Cross, Peter Langfelder +2 more

#
I wonder if it makes sense to reduce the dimensionality of the variables somehow?

David Cross
d.cross at tcu.edu
www.davidcross.us
On May 18, 2011, at 9:41 AM, Meng Wu wrote:

            
#
On Wed, May 18, 2011 at 7:41 AM, Meng Wu <mengwu1002 at gmail.com> wrote:
You could use multidimensional scaling, function cmdscale(), to
produce a 2-dimensional representation of your data, then plot it
using colors that correspond to the clusters.

For example, suppose your data is stored in matrix X (1000x33). I
assume you clustered the samples, not the variables, so you have a
vector label[] with length 33 that has values between 1 and 4. Since
k-means uses Euclidean distance, you would re-create the distance

dst = dist(t(X))

then feed it into cmdscale()

mds = cmdscale(dst);

then plot it:

plot(mds, col = label)

HTH,

Peter
#
Hi Meng,
For suggestions it would be extremely helpful to tell us what kind of 
variables your 1000 variables are.

Parallel coordinate plots plot values over (many) variables. Whether 
this is useful, depends very much on your variables: E.g. I have 
spectral channels, they have an intrinsic order and the values have 
physically the same meaning (and almost the same range), so the parallel 
coordinate plot comes naturally (it produces in fact the spectra).

Claudia

  
    
#
One idea:  Pick the three largest clusters, their centers determine a plane.
project your data into that plane.

albyn
On Wed, May 18, 2011 at 06:55:39PM +0200, Claudia Beleites wrote: