Back to formatted view
Raw Message

Message-ID: <499ED7FF.2010300@statistik.tu-dortmund.de>
Date: 2009-02-20T16:19:11Z
From: Uwe Ligges
Subject: cluster analysis: mean values for each variable and cluster
In-Reply-To: <22120427.post@talk.nabble.com>

jgaspard wrote:
> Hi all!
> 
> I'm new to R and don't know many about it. Because it is free, I managed to
> learn it a little bit.
> 
> Here is my problem: I did a cluster analysis on 30 observations and 16
> variables (monde, figaro, liberation, etc.). Here is the .txt data file:
> 
> "monde","figaro","liberation","yespeople","nopeople","bxl","europe","ue","union_eur","other","yesmeto","nometo","yesfonc","nofonc","yestone","notone"
> 1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0
> 1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0
> 1,0,0,0,1,0,0,0,1,0,1,0,1,0,1,0
> 0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,1
> 1,0,0,0,1,0,0,0,1,0,0,1,1,0,0,1
> 1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0
> 1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0
> 1,0,0,0,1,0,0,0,1,0,0,1,0,1,1,0
> 0,1,0,0,1,0,0,0,1,0,0,1,0,1,1,0
> 0,1,0,0,1,0,0,0,0,1,0,1,0,1,1,0
> 1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,1
> 0,1,0,0,1,0,0,1,0,0,0,1,1,0,1,0
> 0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0
> 1,0,0,0,1,0,0,1,0,0,0,1,0,1,1,0
> 0,1,0,0,1,0,0,0,1,0,0,1,1,0,1,0
> 0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0
> 0,1,0,1,0,0,1,0,0,0,0,1,0,1,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,0,1,0,1
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 1,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 
> 
> The steps I made were those:
> 
> headlines=read.table("/data.csv", header=T, sep=",")
> data
> dist=dist(data,method="euclidean")
> dist
> cluster=hclust(dist,method="ward")
> cluster
> plot(cluster)
> rect.hclust(cluster, k=4, border="red")
> 
> I extracted 4 clusters from the data. My question is: is it possible to
> produce a summary of every mean values for each variable of each of the 4
> clusters?


Well, I think this is not what you want.
Probably you want to use Manhattan distance (rather than Euclidean) 0/1 
data and you want to know the number of 1s and the total number in each 
cluster.

Anyway, in order to answer your question, do an assignment in the end 
such as:

x <- rect.hclust(cluster, k=4, border="red")
sapply(x, function(i) colMeans(data[i,]))

Uwe Ligges



> Thanks a lot in advance,
> 
> Jeoffrey
> 
> 
> 
>