Skip to content
Prev 295564 / 398502 Next

Manually modifying an hclust dendrogram to remove singletons

Can't put my finger on it but something about your idea rubs me the
wrong way. Maybe it's that the tree depends on the hierarchical
clustering algorithm and the choice on how to trim it should be based
on something more defensible than "avoid singletons". In this example
Hawaii is really different than New Hampshire, why would you want them
clustered together ?

But, it's your work, field of study, whatever. If you are going to do
it anyway, one way would be to loop over cut heights:

 hc <- hclust(dist(USArrests), "ave")
 plot(hc)
 hr <- range(hc$height)
 tol<- diff(hr)/100    # set tolerance level
 for(i in seq(1e-4+hr[1],hr[2],tol)){
 hcc <- rect.hclust(hc,h=i)
 if(all(sapply(hcc,length)>1)) break
 }
 str(hcc)

# or if you prefer dendrogram
 dend1<- as.dendrogram(hc)
 for(i in seq(1e-4+hr[1],hr[2],tol)){
 dend2 <- cut(dend1,h=i)
 if(all(sapply(dend2$lower,function(x) attr(x,'members'))>1)) break
 }
 dend2

Cheers
On Thu, May 24, 2012 at 10:31 AM, <r-help.20.trevva at spamgourmet.com> wrote: