Cluster analysis: hclust manipulation possible?
On Mon, 16 Nov 2009, Jopi Harri wrote:
I am doing cluster analysis [hclust(Dist, method="average")] on data that potentially contains redundant objects. As expected, the inclusion of redundant objects affects the clustering result, i.e., the data a1, = a2, = a3, b, c, d, e1, = e2 is likely to cluster differently from the same data without the redundancy, i.e., a1, b, c, d, e1. This is apparent when the outcome is visualized as a dendrogram. Now, it seems that the clustering result for which the redundancy has been eliminated is more robust for the present assignment than that of the redundant data. Naturally, there is no problem in the elimination: just exclude the redundant objects from Dist. However, it would be very convenient to be able to include the redundant objects in the *dendrogram* by attaching them as 0-level branches to the subtrees, i.e.: 1.0........-------........ 0.5....___|__...._|_...... 0.0.._|_..|..|..|.._|_.... ....|.|.|.|..|..|.|...|... ...a1a2a3.b..c..d.e1.e2... instead of 1.0........-------........ 0.5....___|__...._|_...... 0.0...|...|..|..|...|..... ......a1..b..c..d..e1..... The question: Can this be accomplished in the *dendrogram plot* by manipulating the resulting hclust data structure or by some other means, and if yes, how?
Yes, you need to study ?hclust particularly the part about 'Value' from which you will see what needs modification. Here is a very simple example:
res <- hclust(dist(1-diag(3)*rnorm(3))) plot(res) res2 <- res res2$merge <- rbind(-cbind(1:3,4:6), matrix(ifelse( res2$merge<0, -res2$merge, res2$merge+sum(res2$merge<0)),2)) res2$height <- c(rep(0,3), res2$height) res2$order <- as.vector( rbind(res2$order,(4:6)[res2$order]) ) plot(res2) str( res ) str( res2 )
Alternatively, you could use as.dendrogram( res ) as the point of departure and manipulate the value. HTH, Chuck
Jopi Harri
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901