Minor bug with stats::isoreg

4 messages · Travers Ching, Ben Bolker, Ivan Krylov +1 more

Original

1

4

Wed, Sep 27, 2023 1:49 PM #

Hello, I'd like to file a small bug report. I searched and didn't find a
duplicate report.

Calling isoreg with an Inf value causes a segmentation fault, tested on R
4.3.1 and R 4.2. A reproducible example is: `isoreg(c(0,Inf))`

Ben Bolker

Wed, Sep 27, 2023 2:53 PM #

Thanks! Submitted as https://bugs.r-project.org/show_bug.cgi?id=18603

On 2023-09-27 4:49 p.m., Travers Ching wrote:

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Ivan Krylov

Wed, Sep 27, 2023 2:59 PM #

? Wed, 27 Sep 2023 13:49:58 -0700
Travers Ching <traversc at gmail.com> ?????:

Indeed, the code in src/library/stats/src/isoreg.c contains the
following loop:

    do {
	slope = R_PosInf;
	for (i = known + 1; i <= n; i++) {
	    tmp = (REAL(yc)[i] - REAL(yc)[known]) / (i - known);
            // if `tmp` becomes +Inf or NaN...
            // or both `tmp` and `slope` become -Inf...
	    if (tmp < slope) { // <-- then this is false
		slope = tmp;
		ip = i; // <-- so this assignment never happens
	    }
	}/* tmp := max{i= kn+1,.., n} slope(p[kn] -> p[i])  and
	  *  ip = argmax{...}... */
	INTEGER(iKnots)[n_ip++] = ip; // <-- heap overflow and crash
// ...
    } while ((known = ip) < n); // <-- this loop never terminates

I'm not quite sure how to fix this. Checking for tmp <= slope would
have been a one-character patch, but it changes the reference outputs
and doesn't handle isnan(tmp), so it's probably not correct. The
INTEGER(iKnots)[n_ip++] = ip; assignment should only be reached in case
of knots, but since the `ip` index never progresses past the
+/-infinity, the knot condition is triggered repeatedly.

Least squares methods don't handle infinities well anyway, so maybe
it's best to put the check in the R function instead:

--- src/library/stats/R/isoreg.R	(revision 85226)
+++ src/library/stats/R/isoreg.R	(working copy)
@@ -22,8 +22,8 @@
 {
     xy <- xy.coords(x,y)
     x <- xy$x
-    if(anyNA(x) || anyNA(xy$y))
-	stop("missing values not allowed")
+    if(!all(is.finite(x)) || !all(is.finite(xy$y)))
+	stop("missing and infinite values not allowed")
     isOrd <- ((!is.null(xy$xlab) && xy$xlab == "Index")
               || !is.unsorted(x, strictly = TRUE))
     if(!isOrd) {

Best regards,
Ivan

Martin Maechler

Thu, Sep 28, 2023 1:53 AM #

> ? Wed, 27 Sep 2023 13:49:58 -0700 Travers Ching
    > <traversc at gmail.com> ?????:

    >> Calling isoreg with an Inf value causes a segmentation
    >> fault, tested on R 4.3.1 and R 4.2. A reproducible
    >> example is: `isoreg(c(0,Inf))`

    > Indeed, the code in src/library/stats/src/isoreg.c
    > contains the following loop:

The above would not even be sufficient: 
It's the sum(y) really, because internally
  yc <- cumsum(c(0,y)) and actually  diff(yc)  is used
where you get to  Inf - Inf ==> NaN

*** caught segfault ***
address 0x7e48000, cause 'memory not mapped'
/u/maechler/bin/R_arg: Zeile 160: 873336 Speicherzugriffsfehler  (Speicherabzug geschrieben) $exe $@

Also, the C code still does not work for long vectors,
so I want to change the C code anyway.


In any case:
  Thank you, Travers, Ben, and Ivan, for reporting and addressing
  the issue!


------

There is an interesting point here though:

For dealing with +/- Inf, we used to follow the following idea
in R quite keenly (and sometimes extremely): 

If 'Inf' leads a computation to "fail" (NB:  1/Inf  |-->  0 does *not* fail)
try to see what the mathematical *or* computational limit
 x --> Inf would be.
If that is easily defined, we use that.

So, often as a first step, look at what happens if you replace
Inf by 1e100  (and then also what happens if you are finite but
*close* to Inf, i.e. the 7e308 above).

Now here, at least in some cases, such a limit cases are clearly
detectable, e.g., when you let  y[2] --->  -Inf here

[,1]         [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]   4.5000e+00   4.5000e+00 4.50 4.50    5    6    6    6    6     8
 [2,]   4.2500e+00   4.2500e+00 4.25 4.25    5    6    6    6    6     8
 [3,]   4.0000e+00   4.0000e+00 4.00 4.00    5    6    6    6    6     8
 [4,]   3.7500e+00   3.7500e+00 3.75 3.75    5    6    6    6    6     8
 [5,]   3.5000e+00   3.5000e+00 3.50 3.50    5    6    6    6    6     8
 [6,]   3.2500e+00   3.2500e+00 3.25 3.25    5    6    6    6    6     8
 [7,]   3.0000e+00   3.0000e+00 3.00 3.00    5    6    6    6    6     8
 [8,]   2.7500e+00   2.7500e+00 2.75 2.75    5    6    6    6    6     8
 [9,]   2.5000e+00   2.5000e+00 2.50 2.50    5    6    6    6    6     8
[10,]   2.2500e+00   2.2500e+00 2.25 2.25    5    6    6    6    6     8
[11,]   2.0000e+00   2.0000e+00 2.00 2.00    5    6    6    6    6     8
[12,]  -2.5000e+00  -2.5000e+00 1.00 2.00    5    6    6    6    6     8
[13,]  -7.5000e+00  -7.5000e+00 1.00 2.00    5    6    6    6    6     8
[14,]  -4.9750e+02  -4.9750e+02 1.00 2.00    5    6    6    6    6     8
[15,]  -4.9975e+03  -4.9975e+03 1.00 2.00    5    6    6    6    6     8
[16,]  -5.0000e+09  -5.0000e+09 1.00 2.00    5    6    6    6    6     8
[17,]  -5.0000e+99  -5.0000e+99 0.00 0.00    0    0    0    0    0     0
[18,] -5.0000e+199 -5.0000e+199 0.00 0.00    0    0    0    0    0     0
[19,] -5.0000e+299 -5.0000e+299 0.00 0.00    0    0    0    0    0     0

so one could say that ideally,

      isoreg(c(5, -Inf, 1:2, 5:8, 3, 8))

should produce fitted values

             c(-Inf, -Inf, 0, 0, ..., 0)

and if someone has a +/-  elegant implementation
we could again allow  +/-Inf entries in  isoreg(), at least when
the Inf's have all the same sign.

Martin