Help with 2-D plot of k-mean clustering analysis

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110518/f198650b/attachment.pl>
I wonder if it makes sense to reduce the dimensionality of the variables somehow?

David Cross
d.cross at tcu.edu
www.davidcross.us

Hi, all

I would like to use R to perform k-means clustering on my data which
included 33 samples measured with ~1000 variables. I have already used
kmeans package for this analysis, and showed that there are 4 clusters in my
data. However, it's really difficult to plot this cluster in 2-D format
since the "huge" number of variables. One possible way is to project the
multidimensional space into 2-D platform, but I could not find any good way
to do that. Any suggestions or comments will be really helpful!

Thanks,

Meng

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hi, all

?I would like to use R to perform k-means clustering on my data which
included 33 samples measured with ~1000 variables. I have already used
kmeans package for this analysis, and showed that there are 4 clusters in my
data. However, it's really difficult to plot this cluster in 2-D format
since the "huge" number of variables. One possible way is to project the
multidimensional space into 2-D platform, but I could not find any good way
to do that. Any suggestions or comments will be really helpful!
You could use multidimensional scaling, function cmdscale(), to
produce a 2-dimensional representation of your data, then plot it
using colors that correspond to the clusters.

For example, suppose your data is stored in matrix X (1000x33). I
assume you clustered the samples, not the variables, so you have a
vector label[] with length 33 that has values between 1 and 4. Since
k-means uses Euclidean distance, you would re-create the distance

dst = dist(t(X))

then feed it into cmdscale()

mds = cmdscale(dst);

then plot it:

plot(mds, col = label)

HTH,

Peter
Hi Meng,
  I would like to use R to perform k-means clustering on my data which
included 33 samples measured with ~1000 variables. I have already used
kmeans package for this analysis, and showed that there are 4 clusters in my
data. However, it's really difficult to plot this cluster in 2-D format
since the "huge" number of variables. One possible way is to project the
multidimensional space into 2-D platform, but I could not find any good way
to do that. Any suggestions or comments will be really helpful!
For suggestions it would be extremely helpful to tell us what kind of 
variables your 1000 variables are.

Parallel coordinate plots plot values over (many) variables. Whether 
this is useful, depends very much on your variables: E.g. I have 
spectral channels, they have an intrinsic order and the values have 
physically the same meaning (and almost the same range), so the parallel 
coordinate plot comes naturally (it produces in fact the spectra).

Claudia
Thanks,

Meng

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.beleites at ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399
One idea:  Pick the three largest clusters, their centers determine a plane.
project your data into that plane.

albyn
Hi Meng,

 I would like to use R to perform k-means clustering on my data which
included 33 samples measured with ~1000 variables. I have already used
kmeans package for this analysis, and showed that there are 4 clusters in my
data. However, it's really difficult to plot this cluster in 2-D format
since the "huge" number of variables. One possible way is to project the
multidimensional space into 2-D platform, but I could not find any good way
to do that. Any suggestions or comments will be really helpful!
For suggestions it would be extremely helpful to tell us what kind
of variables your 1000 variables are.

Parallel coordinate plots plot values over (many) variables. Whether
this is useful, depends very much on your variables: E.g. I have
spectral channels, they have an intrinsic order and the values have
physically the same meaning (and almost the same range), so the
parallel coordinate plot comes naturally (it produces in fact the
spectra).

Claudia

Thanks,

Meng

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.beleites at ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Albyn Jones
Reed College
jones at reed.edu