Skip to content

R Hierarchical clustering leaf node

2 messages · Qunfeng, Friedrich Leisch

#
Hello,

I am new to the R package. After I use R to perform the hierarchical 
clustering,  I am only interested in retrieving the leaf nodes that share 
the last common ancestors. As illustrated below, I'd like to retrieve (B, 
C) as a cluster and then (D, E) as another cluster.    Any chance to do 
this in R?  Thanks! BTW, I just subscribed to this list (not sure if the 
subscription is succeeded), please copy your anser to my personal email 
(qfdong at iastate.edu) -- Qunfeng

                                 |
		          |
                         -------------------------------------------
                         |                  |                       |
                         A             ---------             -------------
                                        |        |             |            |
                                        B      C            D           E
#
> Hello,
  > I am new to the R package. After I use R to perform the hierarchical 
  > clustering,  I am only interested in retrieving the leaf nodes that share 
  > the last common ancestors. As illustrated below, I'd like to retrieve (B, 
  > C) as a cluster and then (D, E) as another cluster.    Any chance to do 
  > this in R?  Thanks! BTW, I just subscribed to this list (not sure if the 
  > subscription is succeeded), please copy your anser to my personal email 
  > (qfdong at iastate.edu) -- Qunfeng

Knowing what the internal structure of an hclust object is makes it
actually quite easy for groups of two (getting triplets or higher
would require a little bit more code):

As an example we can use

R> set.seed(1)
R> x=rnorm(5)
R> h=hclust(dist(x))
R> str(as.dendrogram(h))
--[dendrogram w/ 2 branches and 5 members at h = 2.43]
  |--leaf 4
  `--[dendrogram w/ 2 branches and 4 members at h = 1.17]
     |--[dendrogram w/ 2 branches and 2 members at h = 0.146]
     |  |--leaf 2
     |  `--leaf 5
     `--[dendrogram w/ 2 branches and 2 members at h = 0.209]
        |--leaf 1
        `--leaf 3

The key is the "merge" element of the reurn object, from that cou can
extract the two pairs by

R> -h$merge[apply(h$merge,1,function(x) all(x<0)),]

     [,1] [,2]
[1,]    2    5
[2,]    1    3

HTH,