Skip to content

Floating point issue

25 messages · GILLIBERT, Andre, Iñaki Ucar, Rui Barradas +10 more

#
Dear r-devel,

For some numbers, the printed value is not equivalent to the input :

options(scipen = 999)
## GOOD
1e24
#> [1]  999999999999999983222784
1e24 == 999999999999999983222784
#> [1] TRUE

## BAD
1e25
#> [1] 10000000000000000905969664
1e25 == 10000000000000000905969664
#> [1] FALSE

## STILL BAD
10000000000000000905969664
#> [1] 10000000000000003053453312

## GOOD AGAIN
10000000000000003053453312
#> [1] 10000000000000003053453312

# Additionally
10000000000000000000000000 == 1e25
#> [1] FALSE

Are these bugs ?
#
On 10 July 2022 at 16:00, Antoine Fabri wrote:
| Dear r-devel,
| 
| For some numbers, the printed value is not equivalent to the input :
| 
| options(scipen = 999)
| ## GOOD
| 1e24
| #> [1]  999999999999999983222784
| 1e24 == 999999999999999983222784
| #> [1] TRUE
| 
| ## BAD
| 1e25
| #> [1] 10000000000000000905969664
| 1e25 == 10000000000000000905969664
| #> [1] FALSE
| 
| ## STILL BAD
| 10000000000000000905969664
| #> [1] 10000000000000003053453312
| 
| ## GOOD AGAIN
| 10000000000000003053453312
| #> [1] 10000000000000003053453312
| 
| # Additionally
| 10000000000000000000000000 == 1e25
| #> [1] FALSE
| 
| Are these bugs ?

No, that is how computers work (with floating point numbers).

Please R FAQ 7.31 "Why doesn?t R think these numbers are equal?" at
  https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f
and the references therein for more.

Dirk
#
The fact that not all values are representable by floating point does not mean that outputing a number with maximum accuracy, then reading it back, should yield a different number.


I would like to point that I cannot reproduce this "bug" on the official R 4.2.0 Windows x86_64 build on an AMD Ryzen 1700 on Windows 10.
[1] 10000000000000000906640224
[1] TRUE
I could also test:
[1] TRUE

For ECMAScript, there is a rule that guarantee that converting a floating point value to a string, and converting back to a floating point value, yields to an equal number.
https://262.ecma-international.org/5.1/#sec-9.8.1

"If x is any Number value other than ?0, then ToNumber<https://262.ecma-international.org/5.1/#sec-9.3>(ToString<https://262.ecma-international.org/5.1/#sec-9.8>(x)) is exactly the same Number value as x."


This avoids some data loss when repeatedly writing/reading floating point values to a text file.


Of course, R is a different language, and I do not think that it has such rule. However, this does not mean that this rule is useless or impossible to implement.


--

Sincerely

Andr? GILLIBERT

________________________________
De : R-devel <r-devel-bounces at r-project.org> de la part de Dirk Eddelbuettel <edd at debian.org>
Envoy? : dimanche 10 juillet 2022 16:09:45
? : Antoine Fabri
Cc : R-devel
Objet : Re: [Rd] Floating point issue

ATTENTION: Cet e-mail provient d?une adresse mail ext?rieure au CHU de Rouen. Ne cliquez pas sur les liens ou n'ouvrez pas les pi?ces jointes ? moins de conna?tre l'exp?diteur et de savoir que le contenu est s?r. En cas de doute, transf?rer le mail ? ? DSI, S?curit? ? pour analyse. Merci de votre vigilance
On 10 July 2022 at 16:00, Antoine Fabri wrote:
| Dear r-devel,
|
| For some numbers, the printed value is not equivalent to the input :
|
| options(scipen = 999)
| ## GOOD
| 1e24
| #> [1]  999999999999999983222784
| 1e24 == 999999999999999983222784
| #> [1] TRUE
|
| ## BAD
| 1e25
| #> [1] 10000000000000000905969664
| 1e25 == 10000000000000000905969664
| #> [1] FALSE
|
| ## STILL BAD
| 10000000000000000905969664
| #> [1] 10000000000000003053453312
|
| ## GOOD AGAIN
| 10000000000000003053453312
| #> [1] 10000000000000003053453312
|
| # Additionally
| 10000000000000000000000000 == 1e25
| #> [1] FALSE
|
| Are these bugs ?

No, that is how computers work (with floating point numbers).

Please R FAQ 7.31 "Why doesn?t R think these numbers are equal?" at
  https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f
and the references therein for more.

Dirk

--
dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
#
On Sun, 10 Jul 2022 at 16:28, GILLIBERT, Andre
<Andre.Gillibert at chu-rouen.fr> wrote:
I cannot reproduce this on a 64-bit Linux build of R 4.1.3 either:

options(scipen = 999)
1e24
#> [1]  999999999999999983222784
1e24 == 999999999999999983222784
#> [1] TRUE

1e25
#> [1] 10000000000000000905969664
1e25 == 10000000000000000905969664
#> [1] TRUE

10000000000000000905969664
#> [1] 10000000000000000905969664

10000000000000003053453312
#> [1] 10000000000000003053453312

10000000000000000000000000 == 1e25
#> [1] TRUE
#
He is my session info :

R version 4.1.3 (2022-03-10)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.0.1

Le dim. 10 juil. 2022 ? 16:44, I?aki Ucar <iucar at fedoraproject.org> a
?crit :

  
  
#
Hello,

R 4.2.1 on Windows 11, sessionInfo() at end.

I cannot reproduce this either.


op <- options(scipen = 999)

1e24
# [1]  999999999999999983202404
1e24 == 999999999999999983222784
# [1] TRUE

1e25
# [1] 10000000000000000906640224
1e25 == 10000000000000000905969664
# [1] TRUE

10000000000000000905969664
# [1] 10000000000000000906640224

10000000000000003053453312
# [1] 10000000000000003053648442

10000000000000000000000000 == 1e25
# [1] TRUE

options(op)

sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=Portuguese_Portugal.utf8 
LC_CTYPE=Portuguese_Portugal.utf8
[3] LC_MONETARY=Portuguese_Portugal.utf8 LC_NUMERIC=C 

[5] LC_TIME=Portuguese_Portugal.utf8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.2.1


Hope this helps,

Rui Barradas

?s 15:44 de 10/07/2022, I?aki Ucar escreveu:
#
The following function, 'bitC' from ?numToBits, displays the bits in a
double precision number, separated into the sign bit, the 11 exponent bits,
and the 52 bits in the mantissa.  I've shown the results with your numbers
from R-2.4.0 on my Windows 11 Lenovo laptop: what do you get?
one double
+     b <- substr(as.character(rev(numToBits(x))), 2L, 2L)
+     paste0(c(b[1L], " ", b[2:12], " | ", b[13:64]), collapse = "")
+   }, ""))
# [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001
# [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001
# [1] 0 00000000000 | 0000000000000000000000000000000000000000000000000000
# [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001

-Bill

On Sun, Jul 10, 2022 at 7:00 AM Antoine Fabri <antoine.fabri at gmail.com>
wrote:

  
  
#
Here's a similar question that may give you some ideas for dealing with 
this:

https://stackoverflow.com/q/72899973/2554330

Duncan Murdoch
On 10/07/2022 10:00 a.m., Antoine Fabri wrote:
#
On 10 July 2022 at 16:34, Duncan Murdoch wrote:
| Here's a similar question that may give you some ideas for dealing with 
| this:
| 
| https://stackoverflow.com/q/72899973/2554330

Good find, but that's his own question (see the profile) and he choose to
cross-post. Which is generally discouraged. 

Dirk
#
Thanks, I get the exact same results as yours

``` r
bitC <- function(x) noquote(vapply(as.double(x), function(x) { # split one
double
  b <- substr(as.character(rev(numToBits(x))), 2L, 2L)
      paste0(c(b[1L], " ", b[2:12], " | ", b[13:64]), collapse = "")
    }, ""))
bitC(10^25)
#> [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001
bitC(10000000000000000905969664)
#> [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010010
bitC(10000000000000000905969664 - 10^25)
#> [1] 0 10000011110 | 0000000000000000000000000000000000000000000000000000
bitC(1e25)
#> [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001
```

<sup>Created on 2022-07-10 by the [reprex package](
https://reprex.tidyverse.org) (v2.0.1)</sup>

Le dim. 10 juil. 2022 ? 22:23, Bill Dunlap <williamwdunlap at gmail.com> a
?crit :

  
  
#
On 10 July 2022 at 17:43, Antoine Fabri wrote:
| He is my session info :
| 
| R version 4.1.3 (2022-03-10)
| Platform: aarch64-apple-darwin20 (64-bit)
| Running under: macOS Monterey 12.0.1

Along with the different session infos posted by numerous people who are
unable to replicate your issue (myself included) and a bit more off-line
thinking about I started to suspect you may have an M1 chip. Which this
confirms.

Tomas and Simon had a deep dive into possible floating point concerns in

  https://blog.r-project.org/2020/11/02/will-r-work-on-apple-silicon/index.html

I also seem to recall that it was also said somewhere (though apparently not
in that post) that the M1 does not have long double. Post such as this one

  https://developer.apple.com/forums/thread/673482

seem to suggest that 80 bit accuracy is Intel specific -- so this may in fact
be an issue of sailing too close to the hardware boundary.

Dirk
#
The results are not exactly the same.  Notice that on Bill's system the 
bit pattern of 10^25 and 10000000000000000905969664 are the same, but 
not so on yours.  So there is a mismatch happening on parsing between 
your M1 mac and other's systems.

This is the main thing I wanted to point out.  But since I'm here I'm 
going to add some additional lazily researched speculation.

As Dirk points out, M1 does not have long double, and if you look at 
what I think is responsible for parsing of numbers like the ones we're 
discussing here, we see[1]:

     double R_strtod5(const char *str, char **endptr, char dec,
     		 Rboolean NA, int exact)
     {
         LDOUBLE ans = 0.0;

IIRC long double on systems that implement it as 80 bits (most that have 
x87 coprocessors) has 63-4 bits of precision, vs 53 for 64 bit long 
double.  Roughly speaking, that's 19-20 digits of base 10 precision for 
long double, vs 15-16 for 64 bit double.  Then:

     > substr(rep("10000000000000000905969664", 2),  c(1, 1), c(16, 20))
     [1] "1000000000000000"     "10000000000000000905"

Again, I have not carefully researched this, but it seems likely that 
parsing is producing different a different outcome in this case because 
the intermediate values generated can be kept at higher precision on 
systems with 80 bit long doubles prior to coercing to double for the 
final result.

IIRC, if you need invertible deparsing/parsing I think you can use:

     deparse(1e25, control=c('hexNumeric'))

Although I don't have an 80-bit-less system to test on (and I am too 
lazy to recompile R without long double to test).

Best,

B.

[1]: https://github.com/r-devel/r-svn/blob/master/src/main/util.c#L1993
On 7/10/22 5:38 PM, Antoine Fabri wrote:
#
You said you got the same results as I did.  Not so, the parsing of the
long numeric differs in the last bit of the mantissa.

(A=Antoine, B=Bill):

bitC(10^25)
#A [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001
#B [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001

bitC(10000000000000000905969664)
#A [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010010
#B [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001

bitC(10000000000000000905969664 - 10^25)
#A [1] 0 10000011110 | 0000000000000000000000000000000000000000000000000000
#B [1] 0 00000000000 | 0000000000000000000000000000000000000000000000000000

On Sun, Jul 10, 2022 at 2:39 PM Antoine Fabri <antoine.fabri at gmail.com>
wrote:

  
  
#
Indeed, apologies for the oversight
On Mon, 11 Jul 2022, 03:14 Bill Dunlap, <williamwdunlap at gmail.com> wrote:

            

  
  
#
> The results are not exactly the same.  Notice that on Bill's system the 
    > bit pattern of 10^25 and 10000000000000000905969664 are the same, but 
    > not so on yours.  So there is a mismatch happening on parsing between 
    > your M1 mac and other's systems.

    > This is the main thing I wanted to point out.  But since I'm here I'm 
    > going to add some additional lazily researched speculation.

    > As Dirk points out, M1 does not have long double, and if you look at 
    > what I think is responsible for parsing of numbers like the ones we're 
    > discussing here, we see[1]:

    > double R_strtod5(const char *str, char **endptr, char dec,
    >                  Rboolean NA, int exact)
    > {
    >      LDOUBLE ans = 0.0;

    > IIRC long double on systems that implement it as 80 bits (most that have 
    > x87 coprocessors) has 63-4 bits of precision, vs 53 for 64 bit long 
    > double.  Roughly speaking, that's 19-20 digits of base 10 precision for 
    > long double, vs 15-16 for 64 bit double.  Then:

    >> substr(rep("10000000000000000905969664", 2),  c(1, 1), c(16, 20))
    > [1] "1000000000000000"     "10000000000000000905"

    > Again, I have not carefully researched this, but it seems likely that 
    > parsing is producing different a different outcome in this case because 
    > the intermediate values generated can be kept at higher precision on 
    > systems with 80 bit long doubles prior to coercing to double for the 
    > final result.

    > IIRC, if you need invertible deparsing/parsing I think you can use:

    > deparse(1e25, control=c('hexNumeric'))

    > Although I don't have an 80-bit-less system to test on (and I am too 
    > lazy to recompile R without long double to test).

    > Best,
    > B.

    > [1]: https://github.com/r-devel/r-svn/blob/master/src/main/util.c#L1993

An excellent analysis, thank you, Brodie and Bill Dunlap.

10 days ago I also had a colleague at ETH, involved in teaching stats,
wanting to submit a bug report about a model fit in nlme which
gave different results (well, a confidence interval of a
correlation which was not at all significant (and estimated
close to zero) changed from +/-0.45 to +/-0.85 ...
... and when I asked for sessionInfo() etc,
the result was that on the  M1  the result differed from all
other platforms.
I did not go into further analysis, but I know that nlme does
*not* use any LDOUBLE | long double  but still, the M1 was less
accurate than any other platform... and I also could not see this
(the wider conf.ints) when I used a version of R configured with

   configure --disable-long-double  

[If anyone is interested I can provide the repr.ex. R code.]
its speed is just a tad less reliable numerically than the
Intel/AMD floating point implementations..

Martin
> On 7/10/22 5:38 PM, Antoine Fabri wrote:
>> Thanks, I get the exact same results as yours
    >> 
    >> ``` r
    >> bitC <- function(x) noquote(vapply(as.double(x), function(x) { # split one
    >> double
    >> b <- substr(as.character(rev(numToBits(x))), 2L, 2L)
    >> paste0(c(b[1L], " ", b[2:12], " | ", b[13:64]), collapse = "")
    >> }, ""))
    >> bitC(10^25)
    >> #> [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001
    >> bitC(10000000000000000905969664)
    >> #> [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010010
    >> bitC(10000000000000000905969664 - 10^25)
    >> #> [1] 0 10000011110 | 0000000000000000000000000000000000000000000000000000
    >> bitC(1e25)
    >> #> [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001
    >> ```
    >> 
    >> <sup>Created on 2022-07-10 by the [reprex package](
    >> https://reprex.tidyverse.org) (v2.0.1)</sup>
    >> 
    >> Le dim. 10 juil. 2022 ? 22:23, Bill Dunlap <williamwdunlap at gmail.com> a
    >> ?crit :
    >> 
    >>> The following function, 'bitC' from ?numToBits, displays the bits in a
    >>> double precision number, separated into the sign bit, the 11 exponent bits,
    >>> and the 52 bits in the mantissa.  I've shown the results with your numbers
    >>> from R-2.4.0 on my Windows 11 Lenovo laptop: what do you get?
    >>> 
    >>>> bitC <- function(x) noquote(vapply(as.double(x), function(x) { # split
    >>> one double
    >>> +     b <- substr(as.character(rev(numToBits(x))), 2L, 2L)
    >>> +     paste0(c(b[1L], " ", b[2:12], " | ", b[13:64]), collapse = "")
    >>> +   }, ""))
    >>>> bitC(10^25)
    >>> # [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001
    >>>> bitC(10000000000000000905969664)
    >>> # [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001
    >>>> bitC(10000000000000000905969664 - 10^25)
    >>> # [1] 0 00000000000 | 0000000000000000000000000000000000000000000000000000
    >>>> bitC(1e25)
    >>> # [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001
    >>> 
    >>> -Bill
    >>> 
    >>> On Sun, Jul 10, 2022 at 7:00 AM Antoine Fabri <antoine.fabri at gmail.com>
>>> wrote:
>>> 
    >>>> Dear r-devel,
    >>>> 
    >>>> For some numbers, the printed value is not equivalent to the input :
    >>>> 
    >>>> options(scipen = 999)
    >>>> ## GOOD
    >>>> 1e24
    >>>> #> [1]  999999999999999983222784
    >>>> 1e24 == 999999999999999983222784
    >>>> #> [1] TRUE
    >>>> 
    >>>> ## BAD
    >>>> 1e25
    >>>> #> [1] 10000000000000000905969664
    >>>> 1e25 == 10000000000000000905969664
    >>>> #> [1] FALSE
    >>>> 
    >>>> ## STILL BAD
    >>>> 10000000000000000905969664
    >>>> #> [1] 10000000000000003053453312
    >>>> 
    >>>> ## GOOD AGAIN
    >>>> 10000000000000003053453312
    >>>> #> [1] 10000000000000003053453312
    >>>> 
    >>>> # Additionally
    >>>> 10000000000000000000000000 == 1e25
    >>>> #> [1] FALSE
    >>>> 
    >>>> Are these bugs ?
    >>>> 
    >>>> [[alternative HTML version deleted]]
    >>>> 
    >>>> ______________________________________________
    >>>> R-devel at r-project.org mailing list
    >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>> 
    >>> 
    >> 
    >> [[alternative HTML version deleted]]
    >> 
    >> ______________________________________________
    >> R-devel at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel
#
80 bits floating point (FP) numbers are great, but I think we cannot rely on it for the future.
I expect, the marketshare of ARM CPUs to grow. It's hard to predict, but ARM may spread in desktop computers in a timeframe of 10 years, and I would not expect it to gain extended precision FP.
Moreover, performance of FP80 is not a priority of Intel. FP80 in recent Intel microprocessors are very slow when using special representations (NaN, NA, Inf, -Inf) or denormal numbers.

Therefore, it may be wise to update R algorithms to make them work quite well with 64 bits FP.

--
Sincerely
Andr? GILLIBERT
#
To be pedantic, the C standard does not guarantee that long double offers more precision than double. If R?s internal FP/decimal conversion routines produce a different result on platforms that support Intel's 80-bit precision vs. platforms that don?t, I would classify this as a bug in R. Available precision can affect numerical properties of algorithms but should not affect things like decimal to binary or via versa conversion ? it either produces the accurate enough number or it doesn?t. 

As a side note, I agree with Andre that relying on Intel?s extended precision in this day an age is not a good idea. This is a legacy feature from over forty years ago, x86 CPUs have been using SSE instructions for floating point computation for over a decade. The x87 instructions are slow and prevent compiler optimisations. Overall, I believe that R would benefit from dropping this legacy cruft. Not that there are too many places where it is used from what I see? 

Best, 

? Taras Zakharko
#
I don?t think there is any guarantee that unrepresentable numbers are parsed into defined patterns, because printing is done by the OS while parsing is done by R. The way R parses decimal numbers[1] is simply by using the obvious res = res * 10 + digit and it can be easily checked that for doubles the representation such obtained from 10000000000000000905969664 is 0x1.08b2a2c280292p+83 (see below if you want to see it yourself) which is not the same as 10^25 which is 0x1.08b2a2c280291p+83. This is true on all platforms, it is not specific to M1. The only difference is if your were to use a different type you can obtain a different result - and that is not well-defined (e.g. long doubles have no guarantees at all as of the precision).  Note that the decimal string above would require 83-bits of precision which is not representable.

(BTW: to make it even more fun, if you were to use double res = 1; repeat(25) res = res * 10; in C, so the naive computation of the original 10^25 you?d get 9999999999999998758486016 and 0x1.08b2a2c28029p+83)

Given that printing is done by the OS and parsing by R, I don?t think R guarantees anything. If you want representable number you?d use the binary representation (sprintf(?%a?) or hex-mode deparse as noted). One could argue that it could make sense to change it one way or another - either having R do it all or having the OS do it all. In the latter case one may obtain more consistent results (e.g. system stdtod() yields the original value even on M1), but it would be OS-specific. In the former R could impose its own guarantees - but currently it does not.

Cheers,
Simon

[1] - https://github.com/r-devel/r-svn/blob/97c0a73f1758d09088c200f924d27b362d55ccdc/src/main/util.c#L2094


#include <stdio.h>
#include <math.h>
#include <stdlib.h>

int main() {
  const char *str = "10000000000000000905969664", *c = str;
  double ans = 0;
  while (*c) {
    ans = 10 * ans + (*(c++) - '0');
    printf("%a\n", ans);
  }
  printf("atof: %a\n", atof(str));
  double pow1025 = pow(10.0, 25);
  printf("--\n10^25:\n%25.f\n%a\n", pow1025, pow1025);
  return 0;
}

0x1p+0
0x1.4p+3
0x1.9p+6
0x1.f4p+9
0x1.388p+13
0x1.86ap+16
0x1.e848p+19
0x1.312dp+23
0x1.7d784p+26
0x1.dcd65p+29
0x1.2a05f2p+33
0x1.74876e8p+36
0x1.d1a94a2p+39
0x1.2309ce54p+43
0x1.6bcc41e9p+46
0x1.c6bf52634p+49
0x1.1c37937e08p+53
0x1.6345785d8a001p+56
0x1.bc16d674ec801p+59
0x1.158e460913d01p+63
0x1.5af1d78b58c41p+66
0x1.b1ae4d6e2ef51p+69
0x1.0f0cf064dd593p+73
0x1.52d02c7e14af8p+76
0x1.a784379d99db6p+79
0x1.08b2a2c280292p+83
atof: 0x1.08b2a2c280291p+83
--
10^25:
10000000000000000905969664
0x1.08b2a2c280291p+83
#
Hi,

Taras makes several good points, especially this one:

On Mon, Jul 11, 2022 at 12:31 PM Taras Zakharko <taras.zakharko at uzh.ch>
wrote:
This is a key point: there is no need to rely on platform-specific
properties of floating-point representations or operations to get correct
(*) decimal-to-binary or binary-to-decimal conversions.  These tasks can be
done correctly in a way that uses only the guarantees provided by IEEE
floats or doubles, and some additional work using big integers (something
like GNU MP) in some cases.  There are freely-available libraries to do
the conversions in a platform-independent, correct, efficient way.  An even
easier solution in one direction is strtod(): decades ago it was not 100%
correct but I haven't seen any flaws in recent versions of GLIBC or on
Windows.  Certainly strtod can be relied on to do a better job than
"multiply-by-10 and add the next digit".

(*) What is correct?  The easy direction is decimal to binary, staying in
the range of positive normalized numbers.  There are a finite number of
rational numbers that are exactly representable as IEEE doubles.  The
correct double representation of a decimal number (also a rational) is that
IEEE double that is closest.  In the event of a tie, use the round-to-even
rule.


As a side note, I agree with Andre that relying on Intel?s extended

  
    
#
Simon, 

I think the issue is rather that we have a representable number that is not correctly parsed by R. Step by step: 

The mentioned number 1e25 is, as you say, not representable in IEEE 754 double precision, with two closest representable numbers being

 10000000000000000905969664 (bit pattern 0 10001010010 0000100010110010101000101100001010000000001010010001)

and

 9999999999999998758486016 (bit patterns 0 10001010010 0000100010110010101000101100001010000000001010010000)

So the fact that 1e25 gets printed out as 10000000000000000905969664 makes perfect sense ? it?s the closest number that can be represented. Which is exactly what one expects and also the usual guarantee ? unrepresentable number literals are supposed to be parsed to a closest representable number. This is a problem that can be (and has been) solved correctly and reliably, so there is no reason why R wouldn?t offer the same guarantee here. 

But let?s get to the actual problem. If you type in 10000000000000000905969664 into R on an Apple Silicon machine you will get 

  10000000000000003053453312 (bit pattern 0 10001010010 0000100010110010101000101100001010000000001010010010)

Which is the next double in the sequence and not a correct way to parse the number.  

So unless I made a mistake somewhere it indeed looks like a bug in R?s number parsing code. It produces the expected, correctly rounded result for a non-representable 1e25, but is one bit off for the precisely representable 10000000000000000905969664

And BTW, I have tried the same with C and Swift, and it works as expected, i.e. in C 

  assert(10000000000000000000000000.0 == 10000000000000000905969664.0 );


Best, 

  Taras

  
  
#
[............]

    > 10 days ago I also had a colleague at ETH, involved in teaching stats,
    > wanting to submit a bug report about a model fit in nlme which
    > gave different results (well, a confidence interval of a
    > correlation which was not at all significant (and estimated
    > close to zero) changed from +/-0.45 to +/-0.85 ...

(that was all the case, indeed)

    > ... and when I asked for sessionInfo() etc,
    > the result was that on the  M1  the result differed from all
    > other platforms.

when now rechecking and looking at the example, I've found to my
embarrassment, that the above paragraph (which *was* our conclusion)
has really not been the case.
Rather, as I've now reconfirmed, it seems that *only* on Windows
(from the platforms we looked at), the result differed from the others.

I.e., my nlme - example has been completely irrelevant to the
discussion here.
I do apologize!

Martin
7 days later
#
The difference between macOS 12.4 and Debian 11 (Docker, virtualization framework) running on a MacBook Pro (M1 Max).
-> `.Machine$sizeof.longdouble` on macOS returns 8 whereas on Debian 11 it returns 16.

macOS 12.4 on MacBook Pro (M1 Max):
``` r
.Machine
#> $double.eps
#> [1] 2.220446e-16
#> 
#> $double.neg.eps
#> [1] 1.110223e-16
#> 
#> $double.xmin
#> [1] 2.225074e-308
#> 
#> $double.xmax
#> [1] 1.797693e+308
#> 
#> $double.base
#> [1] 2
#> 
#> $double.digits
#> [1] 53
#> 
#> $double.rounding
#> [1] 5
#> 
#> $double.guard
#> [1] 0
#> 
#> $double.ulp.digits
#> [1] -52
#> 
#> $double.neg.ulp.digits
#> [1] -53
#> 
#> $double.exponent
#> [1] 11
#> 
#> $double.min.exp
#> [1] -1022
#> 
#> $double.max.exp
#> [1] 1024
#> 
#> $integer.max
#> [1] 2147483647
#> 
#> $sizeof.long
#> [1] 8
#> 
#> $sizeof.longlong
#> [1] 8
#> 
#> $sizeof.longdouble
#> [1] 8
#> 
#> $sizeof.pointer
#> [1] 8

bitC <- function(x) noquote(vapply(as.double(x), function(x) { # split one
double
  b <- substr(as.character(rev(numToBits(x))), 2L, 2L)
      paste0(c(b[1L], " ", b[2:12], " | ", b[13:64]), collapse = "")
    }, ""))
bitC(10^25)
#> [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001
bitC(10000000000000000905969664)
#> [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010010
bitC(10000000000000000905969664 - 10^25)
#> [1] 0 10000011110 | 0000000000000000000000000000000000000000000000000000
bitC(1e25)
#> [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001
```

<sup>Created on 2022-07-19 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>

<details style="margin-bottom:10px;">
<summary>
Session info
</summary>

``` r
sessioninfo::session_info()
#> ? Session info ???????????????????????????????????????????????????????????????
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23)
#>  os       macOS Monterey 12.4
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    UTF-8
#>  tz       Europe/Copenhagen
#>  date     2022-07-19
#>  pandoc   2.18 @ /opt/local/bin/ (via rmarkdown)
#> 
#> ? Packages ???????????????????????????????????????????????????????????????????
#>  package     * version date (UTC) lib source
#>  cli           3.3.0   2022-04-25 [1] CRAN (R 4.2.0)
#>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.2.0)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate      0.15    2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools     0.5.3   2022-07-18 [1] CRAN (R 4.2.1)
#>  knitr         1.39    2022-04-26 [1] CRAN (R 4.2.0)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.2.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  pillar        1.8.0   2022-07-18 [1] CRAN (R 4.2.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.2.0)
#>  R.cache       0.15.0  2021-04-30 [1] CRAN (R 4.2.0)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils       2.12.0  2022-06-28 [1] CRAN (R 4.2.0)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.2.0)
#>  rlang         1.0.4   2022-07-12 [1] CRAN (R 4.2.1)
#>  rmarkdown     2.14    2022-04-25 [1] CRAN (R 4.2.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi       1.7.8   2022-07-11 [1] CRAN (R 4.2.1)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.2.0)
#>  styler        1.7.0   2022-03-13 [1] CRAN (R 4.2.0)
#>  tibble        3.1.7   2022-05-03 [1] CRAN (R 4.2.0)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs         0.4.1   2022-04-13 [1] CRAN (R 4.2.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun          0.31    2022-05-10 [1] CRAN (R 4.2.0)
#>  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Users/benz0li/Library/R/arm64/4.2/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
#> 
#> ??????????????????????????????????????????????????????????????????????????????
```

</details>

Debian 11 (Docker, virtualization framework) on MacBook Pro (M1 Max)
``` r
.Machine
#> $double.eps
#> [1] 2.220446e-16
#> 
#> $double.neg.eps
#> [1] 1.110223e-16
#> 
#> $double.xmin
#> [1] 2.225074e-308
#> 
#> $double.xmax
#> [1] 1.797693e+308
#> 
#> $double.base
#> [1] 2
#> 
#> $double.digits
#> [1] 53
#> 
#> $double.rounding
#> [1] 5
#> 
#> $double.guard
#> [1] 0
#> 
#> $double.ulp.digits
#> [1] -52
#> 
#> $double.neg.ulp.digits
#> [1] -53
#> 
#> $double.exponent
#> [1] 11
#> 
#> $double.min.exp
#> [1] -1022
#> 
#> $double.max.exp
#> [1] 1024
#> 
#> $integer.max
#> [1] 2147483647
#> 
#> $sizeof.long
#> [1] 8
#> 
#> $sizeof.longlong
#> [1] 8
#> 
#> $sizeof.longdouble
#> [1] 16
#> 
#> $sizeof.pointer
#> [1] 8
#> 
#> $longdouble.eps
#> [1] 1.92593e-34
#> 
#> $longdouble.neg.eps
#> [1] 9.62965e-35
#> 
#> $longdouble.digits
#> [1] 113
#> 
#> $longdouble.rounding
#> [1] 5
#> 
#> $longdouble.guard
#> [1] 0
#> 
#> $longdouble.ulp.digits
#> [1] -112
#> 
#> $longdouble.neg.ulp.digits
#> [1] -113
#> 
#> $longdouble.exponent
#> [1] 15
#> 
#> $longdouble.min.exp
#> [1] -16382
#> 
#> $longdouble.max.exp
#> [1] 16384

bitC <- function(x) noquote(vapply(as.double(x), function(x) { # split one
double
  b <- substr(as.character(rev(numToBits(x))), 2L, 2L)
      paste0(c(b[1L], " ", b[2:12], " | ", b[13:64]), collapse = "")
    }, ""))
bitC(10^25)
#> [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001
bitC(10000000000000000905969664)
#> [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001
bitC(10000000000000000905969664 - 10^25)
#> [1] 0 00000000000 | 0000000000000000000000000000000000000000000000000000
bitC(1e25)
#> [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001
```

<sup>Created on 2022-07-19 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>

<details style="margin-bottom:10px;">
<summary>
Session info
</summary>

``` r
sessioninfo::session_info()
#> ? Session info ???????????????????????????????????????????????????????????????
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23)
#>  os       Debian GNU/Linux 11 (bullseye)
#>  system   aarch64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Etc/UTC
#>  date     2022-07-19
#>  pandoc   2.18 @ /usr/bin/ (via rmarkdown)
#> 
#> ? Packages ???????????????????????????????????????????????????????????????????
#>  package     * version date (UTC) lib source
#>  cli           3.3.0   2022-04-25 [2] CRAN (R 4.2.1)
#>  crayon        1.5.1   2022-03-26 [2] CRAN (R 4.2.1)
#>  digest        0.6.29  2021-12-01 [2] CRAN (R 4.2.1)
#>  ellipsis      0.3.2   2021-04-29 [2] CRAN (R 4.2.1)
#>  evaluate      0.15    2022-02-18 [2] CRAN (R 4.2.1)
#>  fansi         1.0.3   2022-03-24 [2] CRAN (R 4.2.1)
#>  fastmap       1.1.0   2021-01-25 [2] CRAN (R 4.2.1)
#>  fs            1.5.2   2021-12-08 [2] CRAN (R 4.2.1)
#>  glue          1.6.2   2022-02-24 [2] CRAN (R 4.2.1)
#>  highr         0.9     2021-04-16 [2] CRAN (R 4.2.1)
#>  htmltools     0.5.2   2021-08-25 [2] CRAN (R 4.2.1)
#>  knitr         1.39    2022-04-26 [2] CRAN (R 4.2.1)
#>  lifecycle     1.0.1   2021-09-24 [2] CRAN (R 4.2.1)
#>  magrittr      2.0.3   2022-03-30 [2] CRAN (R 4.2.1)
#>  pillar        1.7.0   2022-02-01 [2] CRAN (R 4.2.1)
#>  pkgconfig     2.0.3   2019-09-22 [2] CRAN (R 4.2.1)
#>  purrr         0.3.4   2020-04-17 [2] CRAN (R 4.2.1)
#>  R.cache       0.15.0  2021-04-30 [2] CRAN (R 4.2.1)
#>  R.methodsS3   1.8.2   2022-06-13 [2] CRAN (R 4.2.1)
#>  R.oo          1.25.0  2022-06-12 [2] CRAN (R 4.2.1)
#>  R.utils       2.12.0  2022-06-28 [2] CRAN (R 4.2.1)
#>  reprex        2.0.1   2021-08-05 [2] CRAN (R 4.2.1)
#>  rlang         1.0.4   2022-07-12 [2] CRAN (R 4.2.1)
#>  rmarkdown     2.14    2022-04-25 [2] CRAN (R 4.2.1)
#>  sessioninfo   1.2.2   2021-12-06 [2] CRAN (R 4.2.1)
#>  stringi       1.7.8   2022-07-11 [2] CRAN (R 4.2.1)
#>  stringr       1.4.0   2019-02-10 [2] CRAN (R 4.2.1)
#>  styler        1.7.0   2022-03-13 [2] CRAN (R 4.2.1)
#>  tibble        3.1.7   2022-05-03 [2] CRAN (R 4.2.1)
#>  utf8          1.2.2   2021-07-24 [2] CRAN (R 4.2.1)
#>  vctrs         0.4.1   2022-04-13 [2] CRAN (R 4.2.1)
#>  withr         2.5.0   2022-03-03 [2] CRAN (R 4.2.1)
#>  xfun          0.31    2022-05-10 [2] CRAN (R 4.2.1)
#>  yaml          2.3.5   2022-02-21 [2] CRAN (R 4.2.1)
#> 
#>  [1] /home/benz0li/R/aarch64-unknown-linux-gnu-library/4.2
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/local/lib/R/library
#> 
#> ??????????????????????????????????????????????????????????????????????????????
```

</details>
--
benz0li.b-data.io | @benz0li | olivier.benz at b-data.ch
#
Just a quick note of this in case someone finds Olivier's results puzzling. Standard ARM64 ABI (which Linux follows) defines long double as a 16-byte extended precision type. LLVM?s compiler-rt has a software implementation for doing operations on such types. Apple instead defines long double as identical to double in their ABIs. 

References

https://developer.arm.com/documentation/ka004751/1-0

https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms

  
  
#
I found that quite puzzling indeed. Thanks for the references.

So in this case, the different results are not due to the architecture (M1 aka ARM64) but to the system/implementation.
--
benz0li.b-data.io | @benz0li | olivier.benz at b-data.ch
#
I would say that your results demonstrate that R number parsing code relies on higher-than double precision to perform correct number parsing. Extended precision is not guaranteed by the C standard, so I would classify this as a bug in R. Unless of course R doesn?t want to make any  number parsing guarantees, but that would be an odd choice to make :)

Cheers, 

Taras