Ivan Krylov
on Thu, 28 Sep 2023 00:59:57 +0300 writes:
> ? Wed, 27 Sep 2023 13:49:58 -0700 Travers Ching
> <traversc at gmail.com> ?????:
>> Calling isoreg with an Inf value causes a segmentation
>> fault, tested on R 4.3.1 and R 4.2. A reproducible
>> example is: `isoreg(c(0,Inf))`
> Indeed, the code in src/library/stats/src/isoreg.c
> contains the following loop:
do {
slope = R_PosInf;
for (i = known + 1; i <= n; i++) {
tmp = (REAL(yc)[i] - REAL(yc)[known]) / (i - known);
// if `tmp` becomes +Inf or NaN...
// or both `tmp` and `slope` become -Inf...
if (tmp < slope) { // <-- then this is false
slope = tmp;
ip = i; // <-- so this assignment never happens
}
}/* tmp := max{i= kn+1,.., n} slope(p[kn] -> p[i]) and
* ip = argmax{...}... */
INTEGER(iKnots)[n_ip++] = ip; // <-- heap overflow and crash // ...
} while ((known = ip) < n); // <-- this loop never terminates
I'm not quite sure how to fix this. Checking for tmp <= slope would
have been a one-character patch, but it changes the reference outputs
and doesn't handle isnan(tmp), so it's probably not correct. The
INTEGER(iKnots)[n_ip++] = ip; assignment should only be reached in case
of knots, but since the `ip` index never progresses past the
+/-infinity, the knot condition is triggered repeatedly.
Least squares methods don't handle infinities well anyway, so maybe
it's best to put the check in the R function instead:
The above would not even be sufficient:
It's the sum(y) really, because internally
yc <- cumsum(c(0,y)) and actually diff(yc) is used
where you get to Inf - Inf ==> NaN
isoreg(c(5, 9, 1:2, 7e308, 5:8, 3, 8)))
*** caught segfault ***
address 0x7e48000, cause 'memory not mapped'
/u/maechler/bin/R_arg: Zeile 160: 873336 Speicherzugriffsfehler (Speicherabzug geschrieben) $exe $@
Also, the C code still does not work for long vectors,
so I want to change the C code anyway.
In any case:
Thank you, Travers, Ben, and Ivan, for reporting and addressing
the issue!
------
There is an interesting point here though:
For dealing with +/- Inf, we used to follow the following idea
in R quite keenly (and sometimes extremely):
If 'Inf' leads a computation to "fail" (NB: 1/Inf |--> 0 does *not* fail)
try to see what the mathematical *or* computational limit
x --> Inf would be.
If that is easily defined, we use that.
So, often as a first step, look at what happens if you replace
Inf by 1e100 (and then also what happens if you are finite but
*close* to Inf, i.e. the 7e308 above).
Now here, at least in some cases, such a limit cases are clearly
detectable, e.g., when you let y[2] ---> -Inf here
so one could say that ideally,
isoreg(c(5, -Inf, 1:2, 5:8, 3, 8))
should produce fitted values
c(-Inf, -Inf, 0, 0, ..., 0)
and if someone has a +/- elegant implementation
we could again allow +/-Inf entries in isoreg(), at least when
the Inf's have all the same sign.
Martin