[Moved to R-devel, as more appropriate.]
On Mon, 5 Mar 2001, Jari Oksanen wrote:
Canberra distance is defined in function `dist' (standard library `mva') as sum(|x_i - y_i| / |x_i + y_i|) Obviously this is undefined for cases where both x_i and y_i are zeros. Since double zeros are common in many data sets, this is a nuisance. In our field (from which the distance is coming), it is customary to remove double zeros: contribution to distance is zero when both x_i and y_i are zero. Could it be possible to have this kind of feature in R as well? It seems that this would do the trick without breaking applications where double zeros do not occur:
I am sure we should do something, but is this exactly right? From dist() in the R-devel version (1.3.x, eventually) I have enabled the handling of missing values. With this solution, identically zero elements contribute zero to the distance, and are not regarded as missing. Canberra is similar to binary, where x_i = y_i = 0 is treated as equivalent to missing. The issue is if count should be incremented if sum == 0.0 or not. A related issue is the test (sum > 0.0). I guess there are potential problems with optimization on machines that use extended-precision arithmetic, where sum might be non-zero in a register but zero if stored. Not sure if that can actually happen, but a tolerance (e.g. machar's xmax) is usually safer.
--- R-1.2.2/src/appl/distance.c Sun Oct 15 18:13:25 2000
+++ R-work/src/appl/distance.c Mon Mar 5 10:16:53 2001
@@ -93,5 +93,5 @@
double R_canberra(double *x, int nr, int nc, int i1, int i2)
{
- double dist;
+ double dist, sum;
int count, j;
@@ -100,5 +100,7 @@
for(j=0 ; j<nc ; j++) {
if(R_FINITE(x[i1]) && R_FINITE(x[i2])) {
- dist += fabs(x[i1] - x[i2])/fabs(x[i1] + x[i2]);
+ sum = fabs(x[i1] + x[i2]);
+ if (sum > 0.0)
+ dist += fabs(x[i1] - x[i2])/sum;
count++;
}
Best wishes, Jari Oksanen
--
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
Ph. +358 8 5531526 (job), mobile +358 40 5136529
email jari.oksanen@oulu.fi, homepage http://cc.oulu.fi/~jarioksa/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._