Examining how cases are similar by cluster, in cluster analysis

Hello David,

Many thanks - this does exactly what I want and it lets me see whether the
clusters make sense in terms of the patetrn of values & where they join a
cluster.

Regards

Bob
Something like this?

split(FS1, hcli8)
$`1`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
1   1  1  0  1  0  0  1  1  0   1   1   1
3   1  0  1  0  0  1  1  0  0   1   0   1
4   1  1  0  0  0  0  1  1  1   1   1   1
7   0  1  0  1  0  0  1  1  0   1   0   1
9   1  1  1  1  0  1  1  0  1   1   1   0
12  1  0  0  0  0  1  1  1  1   1   0   1
13  0  1  1  1  1  0  0  0  1   1   0   1
15  1  0  1  1  0  0  1  0  0   1   0   1
16  1  0  1  0  0  1  1  0  1   0   1   1
19  0  1  0  0  0  0  1  0  0   1   0   1
20  0  1  1  1  0  0  0  1  1   0   0   1
24  1  1  0  1  0  0  1  0  1   1   1   0
26  1  1  1  1  1  1  0  1  0   1   0   1
28  1  0  1  0  1  0  1  1  0   1   1   1
33  1  1  0  1  0  0  0  0  1   1   0   0
38  1  1  1  0  0  0  0  0  1   1   0   0
40  1  0  1  0  0  0  1  0  0   1   1   1
41  1  1  0  0  0  0  0  0  1   1   1   1
43  0  0  1  0  0  0  1  0  1   1   0   1
52  1  1  1  1  0  0  0  1  1   1   0   1
53  1  1  0  0  1  0  0  1  1   1   0   1
56  1  0  1  0  0  1  1  0  1   0   0   0
60  1  1  1  0  1  1  0  1  1   1   0   1

$`2`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
2   0  1  1  1  1  1  1  0  0   1   1   0
5   0  1  0  1  1  1  0  0  0   1   1   1
6   0  0  0  0  1  0  1  0  0   1   1   1
10  1  1  1  1  1  0  1  1  0   1   0   0
11  0  1  0  1  1  0  1  0  1   1   1   1
14  0  0  1  1  1  1  1  1  0   1   1   1
17  0  1  0  0  1  0  0  0  0   0   1   1
18  1  0  0  1  1  1  1  1  0   0   1   1
29  1  1  0  1  0  1  1  1  0   0   1   1
37  1  0  0  1  1  0  1  1  0   1   0   0
42  1  1  0  1  1  1  1  0  0   0   0   0
46  1  1  0  1  0  1  1  0  0   1   0   1
48  0  1  0  0  1  0  1  0  0   1   1   0
50  0  1  0  1  1  1  1  1  0   0   1   0
51  0  0  0  1  1  1  1  0  0   0   1   1
54  0  0  0  1  1  1  1  0  0   1   1   0
58  0  1  0  1  1  1  1  1  1   1   1   0
61  1  0  1  0  1  1  1  1  0   1   0   0

$`3`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
8   0  1  1  0  0  1  0  1  1   1   1   0
21  0  1  0  0  1  1  0  1  0   1   1   0
22  1  1  0  0  0  1  1  1  0   0   1   0
25  0  1  0  0  0  1  0  1  0   1   1   0
27  1  1  0  0  1  1  0  1  1   0   0   0
32  1  1  1  0  1  1  0  1  0   0   1   0
36  1  1  0  0  0  1  0  1  0   0   0   0
44  1  1  1  1  1  1  0  1  0   0   0   0
63  0  1  1  0  1  1  0  0  1   1   1   0

$`4`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
23  0  0  1  1  0  0  0  0  0   1   0   0
34  0  1  1  1  0  0  0  1  0   1   0   0

$`5`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
30  0  0  0  0  1  1  0  0  1   1   0   1
31  0  1  1  0  1  0  0  0  1   0   1   1
35  0  0  1  0  1  1  0  0  1   1   0   1
47  0  0  1  0  1  0  0  0  1   0   0   1
49  1  0  0  0  1  1  0  0  1   1   1   0
55  1  0  1  0  1  0  0  0  0   1   1   0
59  0  0  1  0  1  0  0  0  1   0   1   1

$`6`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
39  0  0  0  0  1  0  1  1  0   0   0   0
62  0  0  0  0  1  0  1  1  0   0   0   1

$`7`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
45  1  1  0  0  0  0  0  0  0   0   1   0

$`8`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
57  0  0  1  0  0  1  0  1  0   0   1   1

-------
David

-----Original Message-----
From: Bob Green [mailto:bgreen at dyson.brisnet.org.au]
Sent: Sunday, November 18, 2012 3:22 PM
To: dcarlson at tamu.edu; r-help at r-project.org
Subject: RE: [R] Examining how cases are similar by cluster, in cluster
analysis

David,

Many thanks, I'm sure this will be helpful. What would also be
helpful is if I can extract each cluster and examine id by variable,
within the respective cluster. I could index the variables for each
cluster and run such an analysis but thre must be a more efficient
way of doing this (especially as I experiment with different
clustering methods)

Thanks again,

Bob

At 06:44 AM 19/11/2012, David L Carlson wrote:
If you just want a summary of the mean for each variable in each
cluster, this will get you there:

set.seed=42
FS1 <- data.frame(matrix(sample(c(0, 1), 12*63, replace=TRUE),
nrow=63,
+ ncol=12))
dmat <- dist(FS1, method="binary")
cl.test <- hclust(dmat, method="average")
plot(cl.test, hang=-1)
hcli8 <- cutree(cl.test, k=8)
tbl <- aggregate(FS1, by=list(Group=hcli8), mean)
print(tbl, digits=4)
  Group     X1     X2     X3     X4     X5     X6     X7     X8
X9
1     1 0.5122 0.6829 0.6829 0.6341 0.5854 0.5854 0.6829 0.6341
0.5366
2     2 0.0000 0.0000 0.0000 1.0000 0.6667 0.6667 0.0000 0.6667
0.0000
3     3 0.9286 0.1429 0.1429 0.1429 0.2857 0.5714 0.7857 0.3571
0.8571
4     4 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000
5     5 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
1.0000
6     6 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 1.0000
0.0000
7     7 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000
0.0000
8     8 0.0000 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000
0.0000
     X10    X11   X12
1 0.4146 0.4634 0.561
2 0.6667 0.0000 0.000
3 0.8571 0.6429 0.500
4 1.0000 0.0000 0.000
5 0.0000 1.0000 0.000
6 0.0000 0.0000 1.000
7 0.0000 0.0000 0.000
8 0.0000 0.0000 0.000

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Bob Green
Sent: Sunday, November 18, 2012 5:00 AM
To: r-help at r-project.org
Subject: [R] Examining how cases are similar by cluster, in
cluster analysis

Hello,

I used the following code to perform a cluster analysis on a
dataframe consisting of 12 variables (coded as 1,0) and 63
cases.

FS1 <- read.csv("D://Arsontest2.csv",header=T,row.names=1)

str(FS1)

dmat <- dist(FS1,  method="binary")

cl.test <- hclust (dist(FS1, method ="binary"), "ave")

plot(cl.test, hang = -1)

Each case has an id and the dendogram identifies the respective
cases
which constitute each cluster. What I am seeking advice on is
how to
examine the variables on which the cases are similar, within
each cluster.

sort (hcli8 <- cutree(cl.test, k=8)) identifies that the
following
cluster 2is comprised of the following cases:

1641 2295 2594 2654 2799 3213 3510  3513 2958 3294

    2         2        2       2        2        2        2
2
       2        2

This code provides means for the variables by cluster. In
relation to
cluster 2 it appears the cases should have no clear motive and
be depressed :

round(sapply(x, function(i) colMeans(FS1[i,])),2)

                               [,1]   [,2]   [,3] [ ,4]  [,5]
[,6] [,7] [,8]

depressed        0.00 0.33 0.00  0.0    0  0.6 0.00 0.08

unclear             0.33 1.00 1.00  1.0    0  0.0 0.07 0.12

I can manually, examine this variable by variable and look at
how
each of the cases in cluster 2 are similar on the variables. I
am
looking at a more efficient and quicker way to do this.

Bob

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-
project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

Examining how cases are similar by cluster, in cluster analysis

Thread (5 messages)