Skip to content
Prev 106177 / 398506 Next

max.col: bug or just oddity?

I've noticed that the max.col function with the default "random"
option often gives unexpected results. For instance, in this test, it
seems clear what the answer should be:
[1] 2 2 2 2 2 2 2 2 2 2
Ouch! max.col is randomizing across all values.
Even without infinite values, something similar can happen:
[1] 3 3 3 3 3 3 3 3 3 3
[1] 2 3 2 3 3 2 2 2 3 2
The max.col docs say " there is a relative tolerance of 1e-5, relative
to the largest entry in the row", but it's really using the maximum
absolute value entry in the row (appl/maxcol.c, line 35 in R 2.4.0).
Is this necessary for some sort of S-plus compatibility? If so, I
think it would be good to make this absolute value property very clear
in the docs, since it can cause subtle bugs (and did for me).

Personally, I think the behavior is much nicer with the following patch:

--- rplain/R-2.4.0/src/appl/maxcol.c    2006-04-09 18:19:58.000000000 -0400
+++ R-2.4.0/src/appl/maxcol.c   2006-12-14 15:30:56.000000000 -0500
@@ -26,13 +26,14 @@
        do_rand = *ties_meth == 1;

     for (r = 0; r < n_r; r++) {
-       /* first check row for any NAs and find the largest abs(entry) */
+       /* first check row for any NAs and find the largest entry */
        large = 0.0;
        isna = FALSE;
        for (c = 0; c < *nc; c++) {
            a = matrix[r + c * n_r];
            if (ISNAN(a)) { isna = TRUE; break; }
-           if (do_rand) large = fmax2(large, fabs(a));
+           if (!R_FINITE(a)) continue;
+           if (do_rand) large = fmax2(large, a);
        }
        if (isna) { maxes[r] = NA_INTEGER; continue; }

--------------------------------------- END
-------------------------------------------------

This gives the expected behavior in the two examples above.