Hi,
I want to perform a hierarchical clustering using the median as linkage
metric. As I understand it the function hcluster in package amap have this
option but it does not produce the results that I expect.
In the example below M is a matrix of similarities that is transformed into
a matrix of dissimilarities D.
D
[,1] [,2] [,3] [,4] [,5]
[1,] 1.0 0.9 0.2 0.2 0.1
[2,] 0.9 1.0 0.7 1.0 0.0
[3,] 0.2 0.7 1.0 0.8 0.8
[4,] 0.2 1.0 0.8 1.0 0.5
[5,] 0.1 0.0 0.8 0.5 1.0
Since [2,5]=0 the objects 2 and 5 should be grouped together in the first
step as is done by the agnes function but hcluster start by clustering
objects 3 and 4. Why is this?
Regards
Henrik
library(cluster)
library(amap)
# Create matrix M
M <- matrix(nr=5,nc=5)
M[,1] <- c(0,1,8,8,9)
M[,2] <- c(1,0,3,0,10)
M[,3] <- c(8,3,0,2,2)
M[,4] <- c(8,0,2,0,5)
M[,5] <- c(9,10,2,5,0)
# Create matrix D
n <- dim(M)[1]
o <- matrix(1,n,n)
mn <- (1/max(M))*M
D <- o-mn
# Clustering using hcluster
ce <- hcluster(D,link="median")
plot(ce)
# Clustering using agnes
av <- agnes(D,diss=T,method="average")
pltree(av)
On Mon, Sep 27, 2010 at 8:22 AM, Kennedy <henrik.aldberg at gmail.com> wrote:
Hi,
I want to perform a hierarchical clustering using the median as linkage
metric. As I understand it the function hcluster in package amap have this
option but it does not produce the results that I expect.
In the example below M is a matrix of similarities that is transformed into
a matrix of dissimilarities D.
D
? ? [,1] [,2] [,3] [,4] [,5]
[1,] ?1.0 ?0.9 ?0.2 ?0.2 ?0.1
[2,] ?0.9 ?1.0 ?0.7 ?1.0 ?0.0
[3,] ?0.2 ?0.7 ?1.0 ?0.8 ?0.8
[4,] ?0.2 ?1.0 ?0.8 ?1.0 ?0.5
[5,] ?0.1 ?0.0 ?0.8 ?0.5 ?1.0
Since [2,5]=0 the objects 2 and 5 should be grouped together in the first
step as is done by the agnes function but hcluster start by clustering
objects 3 and 4. Why is this?
From reading the hcluster help file I get the sense that the input is
_not_ the distance matrix, but a numeric matrix from which the
distance is computed. I think you should simply look at hclust since
that does implement the median method.
Peter
On Mon, Sep 27, 2010 at 8:22 AM, Kennedy <henrik.aldberg at gmail.com> wrote:
Hi,
I want to perform a hierarchical clustering using the median as linkage
metric. As I understand it the function hcluster in package amap have this
option but it does not produce the results that I expect.
Also, if you have a large(r) data set, the package flashClust provides
a much faster (n^2 vs. n^3) replacement for hclust with exactly the
same results.
Peter
Thank you Peter for your help.
I had tried hclust before but I made the mistake of using the D matrix above
instead of a dist object. Hence
library(flashClust)
d <- as.dist(D)
# Clustering using hclust
hc <- hclust(d, method = "median",members=NULL)
# Clustering using flashClust
fc <- flashClust(d,method="median",members=NULL)
solves the problem I posted. But another question arises. How is the median
linkage calculated? I want it to be like this:
Given clusters C1=(1,2,3) and C2=(4), the distance between C1 and C2 is:
d(C1,C2) = median(d(1,4),d(2,4),d(3,4)) = median(0.2, 1.0, 0.8) = 0.8,
where the values d(1,4), d(2,4) and d(3,4) are taken from the D matrix
above.
If this is not the case, is there any function that uses this linkage
metric?
Thanks
Henrik