Very Slow Gower Similarity Function
Quoting Martin Maechler <maechler at stat.math.ethz.ch>:
I don't know what exactly you want.
The Gower coefficient I am referring to comes from his 1971 article in Biometrics (27(4):857-871). It differs from most commonly used measures (but not, apparently, daisy!) by allowing the incorporation of quantitative and qualitative (binary or unordered multistate characters) variables, and also by providing a mechanism for dropping missing values from similarity calculations. This is also covered in Legendre and Legendre.
The function daisy() in the recommended package "cluster"
has always worked with missing values and IIRC, the book
"Kaufman & Rousseeuw" {which I have not at hand here at home},
clearly mentions Gower's origin of their distance measure
definition.
I was unaware of the daisy function. Looking over it now it differs from the
Gower coefficient primarily in the method of standardization. Gower
standardized each variable by dividing it by it's range ("ranging"), where
daisy does a more conventional standardization (-mean and /SD). As I understand
it, there isn't much to recommend standardizing over ranging (or vice versa) so
daisy may provide a useful alternative for my project. I'll have to look into
it!
Thanks,
Tyler
Martin Maechler, maintainer of cluster package, ETH Zurich