Identified as root cause of a bug in data.table: https://github.com/Rdatatable/data.table/issues/3267 On my machine, strtoi("", base = 2L) produces NA_integer_ (which seems consistent with ?strtoi: "Values which cannot be interpreted as integers or would overflow are returned as NA_integer_"). But on all the other machines I've seen, 0L is returned. This seems to be consistent with the output of a simple C program using the underlying strtol function (see data.table link for this program, and for full sessionInfo() of some environments with differing output). So, what is the correct output of strtoi("", base = 2L)? Is the cross-platform inconsistency to be expected/documentable? Michael Chirico
strtoi output of empty string inconsistent across platforms
5 messages · Martin Maechler, Michael Chirico
Michael Chirico
on Fri, 11 Jan 2019 14:36:17 +0800 writes:
> Identified as root cause of a bug in data.table:
> https://github.com/Rdatatable/data.table/issues/3267
> On my machine, strtoi("", base = 2L) produces NA_integer_
> (which seems consistent with ?strtoi: "Values which cannot
> be interpreted as integers or would overflow are returned
> as NA_integer_").
indeed consistent with R's documentation on strtoi().
What machine would that be?
> But on all the other machines I've seen, 0L is
> returned. This seems to be consistent with the output of a
> simple C program using the underlying strtol function (see
> data.table link for this program, and for full
> sessionInfo() of some environments with differing output).
> So, what is the correct output of strtoi("", base = 2L)?
> Is the cross-platform inconsistency to be
> expected/documentable?
The inconsistency is certainly undesirable.
The relevant utility function in R's source (<R>/src/main/character.c)
is
static int strtoi(SEXP s, int base)
{
long int res;
char *endp;
/* strtol might return extreme values on error */
errno = 0;
if(s == NA_STRING) return(NA_INTEGER);
res = strtol(CHAR(s), &endp, base); /* ASCII */
if(errno || *endp != '\0') res = NA_INTEGER;
if(res > INT_MAX || res < INT_MIN) res = NA_INTEGER;
return (int) res;
}
and so it clearly is a platform-inconsistency in the underlying C
library's strtol().
I think we should make this cross-platform consistent... and
indeed it make much sense to ensure the result of
strtoi("", base=2L) to become NA_integer_
but changes are that would break code that has relied on the
current behavior {on "all but your computer" ;-)} ?
> Michael Chirico
Thank you for the report,
Martin Maechler
ETH Zurich and R Core Team
Martin Maechler
on Fri, 11 Jan 2019 09:44:14 +0100 writes:
Michael Chirico
on Fri, 11 Jan 2019 14:36:17 +0800 writes:
>> Identified as root cause of a bug in data.table:
>> https://github.com/Rdatatable/data.table/issues/3267
>> On my machine, strtoi("", base = 2L) produces NA_integer_
>> (which seems consistent with ?strtoi: "Values which
>> cannot be interpreted as integers or would overflow are
>> returned as NA_integer_").
> indeed consistent with R's documentation on strtoi().
> What machine would that be?
>> But on all the other machines I've seen, 0L is
>> returned. This seems to be consistent with the output of
>> a simple C program using the underlying strtol function
>> (see data.table link for this program, and for full
>> sessionInfo() of some environments with differing
>> output).
>> So, what is the correct output of strtoi("", base = 2L)?
>> Is the cross-platform inconsistency to be
>> expected/documentable?
> The inconsistency is certainly undesirable. The relevant
> utility function in R's source (<R>/src/main/character.c)
> is
> static int strtoi(SEXP s, int base) { long int res; char
> *endp;
> /* strtol might return extreme values on error */
> errno = 0;
> if(s == NA_STRING) return(NA_INTEGER); res =
> strtol(CHAR(s), &endp, base); /* ASCII */ if(errno ||
> *endp != '\0') res = NA_INTEGER; if(res > INT_MAX || res <
> INT_MIN) res = NA_INTEGER; return (int) res; }
> and so it clearly is a platform-inconsistency in the
> underlying C library's strtol().
(corrected typos here: )
> I think we should make this cross-platform consistent ...
> and indeed it makes much sense to ensure the result of
> strtoi("", base=2L) to become NA_integer_
> but chances are that would break code that has relied on
> the current behavior {on "all but your computer" ;-)} ?
I still think that such a change should be done.
'make check all' on the R source (+ Recommended packages) seems
not to signal any error or warning with such a change, so I plan
to commit that change to "the trunk" / "R-devel" soon, unless
concerns are raised highly (and quickly enough).
Martin
Thanks Martin. For what it's worth, this extremely representative, highly scientific Twitter poll suggests the Mac/Linux split is pretty stark (NA on Mac, 0 on Linux) https://twitter.com/michael_chirico/status/1083649190117306369?s=17 On Sat, Jan 12, 2019, 2:00 AM Martin Maechler <maechler at stat.math.ethz.ch wrote:
Martin Maechler
on Fri, 11 Jan 2019 09:44:14 +0100 writes:
Michael Chirico
on Fri, 11 Jan 2019 14:36:17 +0800 writes:
>> Identified as root cause of a bug in data.table:
>> https://github.com/Rdatatable/data.table/issues/3267
>> On my machine, strtoi("", base = 2L) produces NA_integer_
>> (which seems consistent with ?strtoi: "Values which
>> cannot be interpreted as integers or would overflow are
>> returned as NA_integer_").
> indeed consistent with R's documentation on strtoi().
> What machine would that be?
>> But on all the other machines I've seen, 0L is
>> returned. This seems to be consistent with the output of
>> a simple C program using the underlying strtol function
>> (see data.table link for this program, and for full
>> sessionInfo() of some environments with differing
>> output).
>> So, what is the correct output of strtoi("", base = 2L)?
>> Is the cross-platform inconsistency to be
>> expected/documentable?
> The inconsistency is certainly undesirable. The relevant
> utility function in R's source (<R>/src/main/character.c)
> is
> static int strtoi(SEXP s, int base) { long int res; char
> *endp;
> /* strtol might return extreme values on error */
> errno = 0;
> if(s == NA_STRING) return(NA_INTEGER); res =
> strtol(CHAR(s), &endp, base); /* ASCII */ if(errno ||
> *endp != '\0') res = NA_INTEGER; if(res > INT_MAX || res <
> INT_MIN) res = NA_INTEGER; return (int) res; }
> and so it clearly is a platform-inconsistency in the
> underlying C library's strtol().
(corrected typos here: )
> I think we should make this cross-platform consistent ...
> and indeed it makes much sense to ensure the result of
> strtoi("", base=2L) to become NA_integer_
> but chances are that would break code that has relied on
> the current behavior {on "all but your computer" ;-)} ?
I still think that such a change should be done. 'make check all' on the R source (+ Recommended packages) seems not to signal any error or warning with such a change, so I plan to commit that change to "the trunk" / "R-devel" soon, unless concerns are raised highly (and quickly enough). Martin
Michael Chirico
on Sat, 12 Jan 2019 17:34:03 +0800 writes:
> Thanks Martin. For what it's worth, this extremely
> representative, highly scientific Twitter poll suggests
> the Mac/Linux split is pretty stark (NA on Mac, 0 on
> Linux)
> https://twitter.com/michael_chirico/status/1083649190117306369?s=17
> On Sat, Jan 12, 2019, 2:00 AM Martin Maechler
> <maechler at stat.math.ethz.ch wrote:
>> >>>>> Martin Maechler >>>>> on Fri, 11 Jan 2019 09:44:14
>> +0100 writes:
>>
>> >>>>> Michael Chirico >>>>> on Fri, 11 Jan 2019 14:36:17
>> +0800 writes:
>>
>> >> Identified as root cause of a bug in data.table: >>
>> https://github.com/Rdatatable/data.table/issues/3267
>>
>> >> On my machine, strtoi("", base = 2L) produces
>> NA_integer_ >> (which seems consistent with ?strtoi:
>> "Values which >> cannot be interpreted as integers or
>> would overflow are >> returned as NA_integer_").
>>
>> > indeed consistent with R's documentation on strtoi().
>> > What machine would that be?
>>
>> >> But on all the other machines I've seen, 0L is >>
>> returned. This seems to be consistent with the output of
>> >> a simple C program using the underlying strtol
>> function >> (see data.table link for this program, and
>> for full >> sessionInfo() of some environments with
>> differing >> output).
>>
>> >> So, what is the correct output of strtoi("", base =
>> 2L)?
>>
>> >> Is the cross-platform inconsistency to be >>
>> expected/documentable?
>>
>> > The inconsistency is certainly undesirable. The
>> relevant > utility function in R's source
>> (<R>/src/main/character.c) > is
>>
>> > static int strtoi(SEXP s, int base) { long int res;
>> char > *endp;
>>
>> > /* strtol might return extreme values on error */ >
>> errno = 0;
>>
>> > if(s == NA_STRING) return(NA_INTEGER); res = >
>> strtol(CHAR(s), &endp, base); /* ASCII */ if(errno || >
>> *endp != '\0') res = NA_INTEGER; if(res > INT_MAX || res
>> < > INT_MIN) res = NA_INTEGER; return (int) res; }
>>
>> > and so it clearly is a platform-inconsistency in the >
>> underlying C library's strtol().
>>
>> (corrected typos here: )
>>
>> > I think we should make this cross-platform consistent
>> ... > and indeed it makes much sense to ensure the
>> result of
>>
>> > strtoi("", base=2L) to become NA_integer_
>>
>> > but chances are that would break code that has relied
>> on > the current behavior {on "all but your computer"
>> ;-)} ?
>>
>> I still think that such a change should be done.
>>
>> 'make check all' on the R source (+ Recommended packages)
>> seems not to signal any error or warning with such a
>> change, so I plan to commit that change to "the trunk" /
>> "R-devel" soon, unless concerns are raised highly (and
>> quickly enough).
I've now committed svn rev 75977 to R-devel which does change
strtoi("", <base>) to return NA_integer_ (for all <base>)
platform independently, not even calling lib C's strtol() in
that case.
Martin