Hello, I'd like to file a small bug report. I searched and didn't find a duplicate report. Calling isoreg with an Inf value causes a segmentation fault, tested on R 4.3.1 and R 4.2. A reproducible example is: `isoreg(c(0,Inf))`
Minor bug with stats::isoreg
4 messages · Travers Ching, Ben Bolker, Ivan Krylov +1 more
Thanks! Submitted as https://bugs.r-project.org/show_bug.cgi?id=18603
On 2023-09-27 4:49 p.m., Travers Ching wrote:
Hello, I'd like to file a small bug report. I searched and didn't find a duplicate report. Calling isoreg with an Inf value causes a segmentation fault, tested on R 4.3.1 and R 4.2. A reproducible example is: `isoreg(c(0,Inf))` [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
? Wed, 27 Sep 2023 13:49:58 -0700 Travers Ching <traversc at gmail.com> ?????:
Calling isoreg with an Inf value causes a segmentation fault, tested on R 4.3.1 and R 4.2. A reproducible example is: `isoreg(c(0,Inf))`
Indeed, the code in src/library/stats/src/isoreg.c contains the
following loop:
do {
slope = R_PosInf;
for (i = known + 1; i <= n; i++) {
tmp = (REAL(yc)[i] - REAL(yc)[known]) / (i - known);
// if `tmp` becomes +Inf or NaN...
// or both `tmp` and `slope` become -Inf...
if (tmp < slope) { // <-- then this is false
slope = tmp;
ip = i; // <-- so this assignment never happens
}
}/* tmp := max{i= kn+1,.., n} slope(p[kn] -> p[i]) and
* ip = argmax{...}... */
INTEGER(iKnots)[n_ip++] = ip; // <-- heap overflow and crash
// ...
} while ((known = ip) < n); // <-- this loop never terminates
I'm not quite sure how to fix this. Checking for tmp <= slope would
have been a one-character patch, but it changes the reference outputs
and doesn't handle isnan(tmp), so it's probably not correct. The
INTEGER(iKnots)[n_ip++] = ip; assignment should only be reached in case
of knots, but since the `ip` index never progresses past the
+/-infinity, the knot condition is triggered repeatedly.
Least squares methods don't handle infinities well anyway, so maybe
it's best to put the check in the R function instead:
--- src/library/stats/R/isoreg.R (revision 85226)
+++ src/library/stats/R/isoreg.R (working copy)
@@ -22,8 +22,8 @@
{
xy <- xy.coords(x,y)
x <- xy$x
- if(anyNA(x) || anyNA(xy$y))
- stop("missing values not allowed")
+ if(!all(is.finite(x)) || !all(is.finite(xy$y)))
+ stop("missing and infinite values not allowed")
isOrd <- ((!is.null(xy$xlab) && xy$xlab == "Index")
|| !is.unsorted(x, strictly = TRUE))
if(!isOrd) {
Best regards, Ivan
Ivan Krylov
on Thu, 28 Sep 2023 00:59:57 +0300 writes:
> ? Wed, 27 Sep 2023 13:49:58 -0700 Travers Ching
> <traversc at gmail.com> ?????:
>> Calling isoreg with an Inf value causes a segmentation
>> fault, tested on R 4.3.1 and R 4.2. A reproducible
>> example is: `isoreg(c(0,Inf))`
> Indeed, the code in src/library/stats/src/isoreg.c
> contains the following loop:
do {
slope = R_PosInf;
for (i = known + 1; i <= n; i++) {
tmp = (REAL(yc)[i] - REAL(yc)[known]) / (i - known);
// if `tmp` becomes +Inf or NaN...
// or both `tmp` and `slope` become -Inf...
if (tmp < slope) { // <-- then this is false
slope = tmp;
ip = i; // <-- so this assignment never happens
}
}/* tmp := max{i= kn+1,.., n} slope(p[kn] -> p[i]) and
* ip = argmax{...}... */
INTEGER(iKnots)[n_ip++] = ip; // <-- heap overflow and crash // ...
} while ((known = ip) < n); // <-- this loop never terminates
I'm not quite sure how to fix this. Checking for tmp <= slope would have been a one-character patch, but it changes the reference outputs and doesn't handle isnan(tmp), so it's probably not correct. The INTEGER(iKnots)[n_ip++] = ip; assignment should only be reached in case of knots, but since the `ip` index never progresses past the +/-infinity, the knot condition is triggered repeatedly.
Least squares methods don't handle infinities well anyway, so maybe it's best to put the check in the R function instead:
--- src/library/stats/R/isoreg.R (revision 85226)
+++ src/library/stats/R/isoreg.R (working copy)
@@ -22,8 +22,8 @@
{
xy <- xy.coords(x,y)
x <- xy$x
- if(anyNA(x) || anyNA(xy$y))
- stop("missing values not allowed")
+ if(!all(is.finite(x)) || !all(is.finite(xy$y)))
+ stop("missing and infinite values not allowed")
isOrd <- ((!is.null(xy$xlab) && xy$xlab == "Index")
|| !is.unsorted(x, strictly = TRUE))
if(!isOrd) {
-- Best regards, Ivan
The above would not even be sufficient: It's the sum(y) really, because internally yc <- cumsum(c(0,y)) and actually diff(yc) is used where you get to Inf - Inf ==> NaN
isoreg(c(5, 9, 1:2, 7e308, 5:8, 3, 8)))
*** caught segfault *** address 0x7e48000, cause 'memory not mapped' /u/maechler/bin/R_arg: Zeile 160: 873336 Speicherzugriffsfehler (Speicherabzug geschrieben) $exe $@ Also, the C code still does not work for long vectors, so I want to change the C code anyway. In any case: Thank you, Travers, Ben, and Ivan, for reporting and addressing the issue! ------ There is an interesting point here though: For dealing with +/- Inf, we used to follow the following idea in R quite keenly (and sometimes extremely): If 'Inf' leads a computation to "fail" (NB: 1/Inf |--> 0 does *not* fail) try to see what the mathematical *or* computational limit x --> Inf would be. If that is easily defined, we use that. So, often as a first step, look at what happens if you replace Inf by 1e100 (and then also what happens if you are finite but *close* to Inf, i.e. the 7e308 above). Now here, at least in some cases, such a limit cases are clearly detectable, e.g., when you let y[2] ---> -Inf here
n <- length(y0 <- c(5, 9, 1:2, 5:8, 3, 8))
y2s <- c(10:0, -10, -20, -1000, -1e4, -1e10, -1e100, -1e200, -1e300)
iSet <- vapply(y2s, function(y2) isoreg({y <- y0; y[2] <- y2; y})$yf, numeric(n))
t(iSet) # *does* change as function of y2 *but* predictably
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 4.5000e+00 4.5000e+00 4.50 4.50 5 6 6 6 6 8 [2,] 4.2500e+00 4.2500e+00 4.25 4.25 5 6 6 6 6 8 [3,] 4.0000e+00 4.0000e+00 4.00 4.00 5 6 6 6 6 8 [4,] 3.7500e+00 3.7500e+00 3.75 3.75 5 6 6 6 6 8 [5,] 3.5000e+00 3.5000e+00 3.50 3.50 5 6 6 6 6 8 [6,] 3.2500e+00 3.2500e+00 3.25 3.25 5 6 6 6 6 8 [7,] 3.0000e+00 3.0000e+00 3.00 3.00 5 6 6 6 6 8 [8,] 2.7500e+00 2.7500e+00 2.75 2.75 5 6 6 6 6 8 [9,] 2.5000e+00 2.5000e+00 2.50 2.50 5 6 6 6 6 8 [10,] 2.2500e+00 2.2500e+00 2.25 2.25 5 6 6 6 6 8 [11,] 2.0000e+00 2.0000e+00 2.00 2.00 5 6 6 6 6 8 [12,] -2.5000e+00 -2.5000e+00 1.00 2.00 5 6 6 6 6 8 [13,] -7.5000e+00 -7.5000e+00 1.00 2.00 5 6 6 6 6 8 [14,] -4.9750e+02 -4.9750e+02 1.00 2.00 5 6 6 6 6 8 [15,] -4.9975e+03 -4.9975e+03 1.00 2.00 5 6 6 6 6 8 [16,] -5.0000e+09 -5.0000e+09 1.00 2.00 5 6 6 6 6 8 [17,] -5.0000e+99 -5.0000e+99 0.00 0.00 0 0 0 0 0 0 [18,] -5.0000e+199 -5.0000e+199 0.00 0.00 0 0 0 0 0 0 [19,] -5.0000e+299 -5.0000e+299 0.00 0.00 0 0 0 0 0 0
so one could say that ideally,
isoreg(c(5, -Inf, 1:2, 5:8, 3, 8))
should produce fitted values
c(-Inf, -Inf, 0, 0, ..., 0)
and if someone has a +/- elegant implementation
we could again allow +/-Inf entries in isoreg(), at least when
the Inf's have all the same sign.
Martin