Skip to content

merge small clusters in R

2 messages · Sheila the angel, Boris Steipe

#
In R, I have cut a dendrogram into clusters. However some of the clusters
have only few samples. How can I merge the small clusters with nearest big
cuter.

hc <- hclust(dist(USArrests))
plot(hc, cex = 0.6)
rect.hclust(hc, k = 4, border = 2:5)

It gives one cluster with only 2 samples. How can I merge it with nearest
cluster?

Thanks
S.
#
This is not a well defined question, until your notions of "small" and "nearest" are defined. In your specific example

   rect.hclust(hc, k = 3, border = 2:5)

... will do what you are asking for. This is not likely to work in the general case - imagine that your cluster of size two only meets the others at the root: in that case you would be distorting the result significantly if you were to merge it in with another cluster, simply based on membership size. That said, perhaps the package dynamicTreeCut will help you find cuts in a dendrogram that more closely match your intuition.

B.
On Mar 16, 2016, at 11:38 AM, Sheila the angel <from.d.putto at gmail.com> wrote: