Skip to content

hcluster with linkage median

4 messages · Kennedy, Peter Langfelder

#
Hi,

I want to perform a hierarchical clustering using the median as linkage
metric. As I understand it the function hcluster in package amap have this
option but it does not produce the results that I expect.

In the example below M is a matrix of similarities that is transformed into
a matrix of dissimilarities D.
[,1] [,2] [,3] [,4] [,5]
[1,]  1.0  0.9  0.2  0.2  0.1
[2,]  0.9  1.0  0.7  1.0  0.0
[3,]  0.2  0.7  1.0  0.8  0.8
[4,]  0.2  1.0  0.8  1.0  0.5
[5,]  0.1  0.0  0.8  0.5  1.0

Since [2,5]=0 the objects 2 and 5 should be grouped together in the first
step as is done by the agnes function but hcluster start by clustering
objects 3 and 4. Why is this?

Regards

Henrik

library(cluster)
library(amap)

# Create matrix M
M <- matrix(nr=5,nc=5)
M[,1] <- c(0,1,8,8,9) 
M[,2] <- c(1,0,3,0,10)
M[,3] <- c(8,3,0,2,2)
M[,4] <- c(8,0,2,0,5)
M[,5] <- c(9,10,2,5,0)

# Create matrix D
n <- dim(M)[1]
o <- matrix(1,n,n)
mn <- (1/max(M))*M
D <- o-mn

# Clustering using hcluster
ce <- hcluster(D,link="median")
plot(ce)

# Clustering using agnes
av <- agnes(D,diss=T,method="average")
pltree(av)
#
On Mon, Sep 27, 2010 at 8:22 AM, Kennedy <henrik.aldberg at gmail.com> wrote:
_not_ the distance matrix, but a numeric matrix from which the
distance is computed. I think you should simply look at hclust since
that does implement the median method.

Peter
#
On Mon, Sep 27, 2010 at 8:22 AM, Kennedy <henrik.aldberg at gmail.com> wrote:
Also, if you have a large(r) data set, the package flashClust provides
a much faster (n^2 vs. n^3) replacement for hclust with exactly the
same results.

Peter
#
Thank you Peter for your help.

I had tried hclust before but I made the mistake of using the D matrix above
instead of a dist object. Hence

  library(flashClust)

  d <- as.dist(D)
  # Clustering using hclust
  hc <- hclust(d, method = "median",members=NULL)
  # Clustering using flashClust 
  fc <- flashClust(d,method="median",members=NULL)

solves the problem I posted. But another question arises. How is the median
linkage calculated? I want it to be like this:

Given clusters C1=(1,2,3) and C2=(4), the distance between C1 and C2 is: 
  d(C1,C2) = median(d(1,4),d(2,4),d(3,4)) = median(0.2, 1.0, 0.8) = 0.8,
where the values d(1,4), d(2,4) and d(3,4) are taken from the D matrix
above.

If this is not the case, is there any function that uses this linkage
metric?


Thanks

Henrik