Gundala Viswanath <gundalav at gmail.com>
on Sun, 8 Dec 2013 16:11:12 +0900 writes:
> Hi, According to daisy function from cluster
> documentation, it can compute dissimilarity when NA
> (missing) value(s) is present.
> http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/daisy.html
> But why when I tried this code
> library(cluster)
> x <- c(1.115,NA,NA,0.971,NA)
> y <- c(NA,1.006,NA,NA,0.645)
> df <- as.data.frame(rbind(x,y))
> daisy(df,metric="gower")
> It gave this message:
> Dissimilarities :
> x
> y NA
> Metric : mixed ; Types = I, I, I, I, I
> Number of objects : 2
> Warning messages:
> 1: In min(x) : no non-missing arguments to min; returning Inf
> 2: In max(x) : no non-missing arguments to max; returning -Inf
> I welcome other alternative than gower.
> I expect the dissimilarity output gives a non-NA value e.g. 0. What's
> the right way to do it?
Thank you, Gundala, for using a simple reproducible example.
Reading the documentation about Gower's distance a bit more,
you'd have found that it works by basically giving weight zero
to *pairs* of variable values where one of the two values is
missing.
In situations like yours, *all* pairs have at least one missing,
so there's no way to get a non-NA distance.
*AND* the documentation already contains this, at the very end
of the section 'Details' :
If all weights w_k delta(ij;k) are zero, the dissimilarity is set to ?NA?.
I.e., we have
install.packages("fortunes")
fortune("WTFM")
This is all documented in TFM. Those who WTFM don't want to have to WTFM again
on the mailing list. RTFM.
-- Barry Rowlingson
R-help (October 2003)
... which I now did in spite of Barry's excellent point
... let's say it's because of approaching Christmas !
Martin Maechler,
ETH Zurich