gdist and gower distance
On Tue, 2004-11-09 at 12:59, Alessio Boattini wrote:
Dear All, I would like to ask clarifications on the gower distnce matrix calculated by the function gdistin the library mvpart. Here is a dummy example:
library(mvpart)
Loading required package: survival Loading required package: splines mvpart package loaded: extends rpart to include multivariate and distance-based partitioning
x=matrix(1:6, byrow=T, ncol=2) x
[,1] [,2] [1,] 1 2 [2,] 3 4 [3,] 5 6
gdist(x, method="euclid")
1 2 2 2.828427 3 5.656854 2.828427 ########################## doing the calculations by hand according to the formula in gdist help page I get the same results. The formula given is: 'euclidean' d[jk] = sqrt(sum (x[ij]-x[ik])^2) #################################
sqrt(8)
[1] 2.828427
gdist(x, method="gower")
1 2 2 0.7071068 3 1.4142136 0.7071068 ####################################### doing the calculations by hand according to the formula in gdist help page cannot reproduce the same results. The formula given is: 'gower' d[jk] = sum (abs(x[ij]-x[ik])/(max(i)-min(i)) ########################################## Could anybody please shed some light?
There seems to be a bug in documentation. The function uses different calculation than the help page specifies. Look at the 'gdist' code. Just to make things easier: In the function body, gower is method 6, and Euclidean distances are method 2. Gower's original paper is available through http://www.jstor.org/ (Biometrics Vol. 27, No. 4, p. 857-871; 1971). cheers, jari oksanen
Jari Oksanen <jarioksa at sun3.oulu.fi>