Skip to content
Prev 295565 / 398502 Next

Manually modifying an hclust dendrogram to remove singletons

On Thu, May 24, 2012 at 9:31 AM, <r-help.20.trevva at spamgourmet.com> wrote:
Hi Mark,

I'm not sure how you want to handle the singletons if you don't want
them in a separate cluster. The package WGCNA (I'm the maintainer) and
its dependency dynamicTreeCut contain a few ways of avoiding
singletons as separate clusters.

One way is to remove them from the resulting clusters. To this end,
use function cutreeStatic, specify the cut height and the minimum
number of elements in the cluster. For example,

clusters1 = cutreeStatic(hc, cutHeight = 35, minSize = 3);

This way all branches that have size below 3 are labeled 0.

To see what you get, use the function plotDendroAndColors like this:

plotDendroAndColors(hc, clusters1, rowText = clusters1 );

Each color corresponds to a cluster, and the cluster label is shown by
the numbers (each number is at the start of the corresponding
cluster).

If you'd like to assign everything but want to avoid cluster that are
too small, use the dynamic tree cut approach
(http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting/).
For example:

clusters2 = cutreeDynamic(hc, distM = as.matrix(dist(USArrests)),
minClusterSize = 3, deepSplit = 2)

To show the clusters:
plotDendroAndColors(hc, clusters2, rowText = clusters2 );

If you think the clusters are too big, try setting deepSplit=3 in the
cutreeDynamic call.

The dynamic tree cut basically assigns all singletons and branches
with size less than minClusterSize to the nearest existing cluster
(notice Hawai and the Florida/North Carolina branch), thus basically
combining hierarchical clustering and a PAM-like step Whether that's a
good approach for your research goal is a question you need to answer.

HTH,

Peter