Do we have to cut a dendrogramm at a specific level or not
Jens Oldeland <oldeland at gmx.de> writes:
I am currently preparing a lecture on 'Cluster Analysis' and I found two very different ways to interpret a dendrogram. The first option is to 'cut' a dendrogram at a specific height, like it is possible with the cluster package. The second option identifies the 'optimal clusters' at different heights, for example see McCune etal. 2002 Analysis of Ecological Communities Figure 11.3. Now these are two very different ways of interpreting and I am wondering which one is 'allowed' or perhaps the more practical way? Is it possible to combine both? I.e. first search for the optimal cut level and then adjust each clustering height by aggregation at heigher levels?
There's really no hard and fast rule to follow with hierarchical clustering. It's essentially a descriptive technique, and a good interpretation depends as much on your understanding of the system as the actual shape of the dendrogram. Even if you take the 'objective' route and cut the tree at a specific height, you still have to choose the height. Borcard et al. (2011) provide some nice tools for picking that height, by the way. They also emphasize that even their favourite tools don't always produce the most interpretable clusters. One, possibly reasonable, approach would be to make your first division based on a specific height on the dendrogram, and then interpret any "sub-clusters" that are nested within your main groups: "cutting the tree at the 1.1 level reveals three main groups: coniferous, mixed and deciduous forests. Within the deciduous forest there are two additional clusters (distinct at the 0.8 level): maple forests and oak forests". Once the dendrogram is drawn, it's really up to the ecologist to determine how best to interpret it. An important point in all this is that clustering is primarily an exploratory/descriptive technique, rather than a confirmatory test. From Borcard et al.: "Clustering is not a typical statistical method in that it does not test any hypothesis. Clustering helps bring out some features hidden in the data; it is the user who decides if these structures are interesting and worth interpreting in ecological terms." HTH, Tyler Borcard, Gillet and Legendre. 2011. Numerical Ecology with R. Springer.