Hi, According to daisy function from cluster documentation, it can compute dissimilarity when NA (missing) value(s) is present. http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/daisy.html But why when I tried this code library(cluster) x <- c(1.115,NA,NA,0.971,NA) y <- c(NA,1.006,NA,NA,0.645) df <- as.data.frame(rbind(x,y)) daisy(df,metric="gower") It gave this message: Dissimilarities : x y NA Metric : mixed ; Types = I, I, I, I, I Number of objects : 2 Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf I welcome other alternative than gower. I expect the dissimilarity output gives a non-NA value e.g. 0. What's the right way to do it? G.V.
Why daisy() in cluster library failed to exclude NA when computing dissimilarity
4 messages · Gundala Viswanath, Sarah Goslee, Martin Maechler +1 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20131208/97978bfd/attachment.pl>
Gundala Viswanath <gundalav at gmail.com>
on Sun, 8 Dec 2013 16:11:12 +0900 writes:
> Hi, According to daisy function from cluster
> documentation, it can compute dissimilarity when NA
> (missing) value(s) is present.
> http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/daisy.html
> But why when I tried this code
> library(cluster)
> x <- c(1.115,NA,NA,0.971,NA)
> y <- c(NA,1.006,NA,NA,0.645)
> df <- as.data.frame(rbind(x,y))
> daisy(df,metric="gower")
> It gave this message:
> Dissimilarities :
> x
> y NA
> Metric : mixed ; Types = I, I, I, I, I
> Number of objects : 2
> Warning messages:
> 1: In min(x) : no non-missing arguments to min; returning Inf
> 2: In max(x) : no non-missing arguments to max; returning -Inf
> I welcome other alternative than gower.
> I expect the dissimilarity output gives a non-NA value e.g. 0. What's
> the right way to do it?
Thank you, Gundala, for using a simple reproducible example.
Reading the documentation about Gower's distance a bit more,
you'd have found that it works by basically giving weight zero
to *pairs* of variable values where one of the two values is
missing.
In situations like yours, *all* pairs have at least one missing,
so there's no way to get a non-NA distance.
*AND* the documentation already contains this, at the very end
of the section 'Details' :
If all weights w_k delta(ij;k) are zero, the dissimilarity is set to ?NA?.
I.e., we have
install.packages("fortunes")
fortune("WTFM")
This is all documented in TFM. Those who WTFM don't want to have to WTFM again
on the mailing list. RTFM.
-- Barry Rowlingson
R-help (October 2003)
... which I now did in spite of Barry's excellent point
... let's say it's because of approaching Christmas !
Martin Maechler,
ETH Zurich
Hi Gundala, This question isn't about a Bioconductor package, so should be asked on R-help instead. Best, Jim
On Sunday, December 08, 2013 2:11:12 AM, Gundala Viswanath wrote:
Hi, According to daisy function from cluster documentation, it can compute dissimilarity when NA (missing) value(s) is present. http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/daisy.html But why when I tried this code library(cluster) x <- c(1.115,NA,NA,0.971,NA) y <- c(NA,1.006,NA,NA,0.645) df <- as.data.frame(rbind(x,y)) daisy(df,metric="gower") It gave this message: Dissimilarities : x y NA Metric : mixed ; Types = I, I, I, I, I Number of objects : 2 Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf I welcome other alternative than gower. I expect the dissimilarity output gives a non-NA value e.g. 0. What's the right way to do it? G.V.
_______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
-- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099