R 2.12.1 Windows 32bit and 64bit - are numerical differences expected?

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20110210/5d969ce3/attachment.pl>
Should one expect minor numerical differences between 64bit and 32bit R on
Windows? Hunting around the lists I've not been able to find a definitive
answer yet. Seems plausible using different precision arithmetic, but waned
to confirm from those who might know for sure.
I think our goal is that those results should be as close as possible. 
R uses the same precision in both 32 bit and 64 bit; the differences are 
all in pointers, not floating point values.

However, the two versions use different run-time libraries, and it is 
possible that there are precision differences coming from there.  I 
think we'd be interested in knowing what they are even if they are 
beyond our control, so I would appreciate it if you could track down 
where the difference arises.

Duncan Murdoch
BACKGROUND

A colleague was trying to replicate some modelling results (from a soon to
be published book) using rpart, ada, and randomForest, for example. My 64bit
Linux and 64bit Windows 7 always agree (so far), but not their 32bit
Windows. I've distilled it to a few simple lines of code to replicate the
differences (but had to stay with the weather dataset from rattle since
could not replicate on standard datasets yet).

library(rpart)
library(rattle)
set.seed(41)
model<- rpart(RainTomorrow ~ ., data=weather[-c(1, 2,
23)], control=rpart.control(minbucket=0))
print(model$cptable)

Final row on 32bit: 9 0.01000000     23 0.1515152 1.1060606 0.1158273
Final row on 64bit: 9 0.01000000     23 0.1515152 1.0909091 0.1152273

Pretty minor, but different. I've not found any seed other than 41 (only
tried a few) that results in a difference.

library(ada) # using rpart underneath
set.seed(41)
model<- ada(RainTomorrow ~ ., data=weather[-c(1, 2, 23)])
print(model)

On 32bit: Train Error: 0.057
On 64bit: Train Error: 0.055

Changing the seed to 42, for example, brings them into sync.

library(randomForest)
set.seed(41)
model<- randomForest(RainTomorrow ~ ., data=weather[-c(1, 2, 23)],
                       importance=TRUE, na.action=na.roughfix)
print(model)

On 32bit:  OOB estimate of  error rate: 12.84%
On 64bit:  OOB estimate of  error rate: 11.75%

sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] randomForest_4.5-36 pmml_1.2.27         XML_3.2-0.2
[4] colorspace_1.0-1    RGtk2_2.20.3        ada_2.0-2
[7] rattle_2.6.2        rpart_3.1-47

loaded via a namespace (and not attached):
[1] tools_2.12.1

sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-mingw32/x64 (64-bit)
...

Thanks,
Graham

	[[alternative HTML version deleted]]

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Should one expect minor numerical differences between 64bit and 32bit R on
Windows? Hunting around the lists I've not been able to find a definitive
answer yet. Seems plausible using different precision arithmetic, but waned
to confirm from those who might know for sure.
One of the sources for the difference between platforms are different
settings of the compiler. On Intel processors, the options may influence,
whether the registers use 80 bit or 64 bit representation of floating
point numbers. In memory, it is always 64 bit. Testing, whether there is
a difference between registers and memory may be done for example using
the code

  #include <stdio.h>
  #define n 3
  int main(int agc, char *argv[])
  {
      double x[n];
      int i;
      for (i=0; i<n; i++) {
          x[i] = 1.0/(i + 5);
      }
      for (i=0; i<n; i++) {
          if (x[i] != 1.0/(i + 5)) {
              printf("difference for %d\n", i);
          }
      }
      return 0;
  }

If the compiler uses SSE arithmetic (-mfpmath=sse), then the output is empty.
If Intel's extended arithmetic is used, then we get

  difference for 0
  difference for 1
  difference for 2

On 32 bit Linuxes, the default was Intel's extended arithmetic, while on
64 bit Linuxes, the default is sometimes SSE. I do not know the situation
on Windows.

Another source of difference is different optimization of expressions.

It is sometimes possible to obtain identical results on different platforms,
however, it cannot be generally guaranteed. For tree construction, even
minor differences in rounding may influence comparisons and this may
result in a different form of the tree.

Petr Savicky.
A more important difference is the number of registers available on 
the CPU, which differs between i386 and x86_64.  Hence computations 
get done in different orders by optimizing compilers.

And yes, all x86_64 CPUs have SSE, so the optimizer uses them at the 
compiler settings we use.

As Duncan mentioned, the runtime (libc/m, on Windows mainly 
MSVCRT.dll) differs between OSes.

We know rather a lot about differences between platforms, as recent 
versions of R contain reference results for almost all the examples, 
and we from time to time compare output from CRAN check runs across 
platforms (this was part of the test suite run before releasing the 
64-bit Windows port).

Almost all the 64-bit platforms are very close (and agree exactly on 
the R examples), and 32-bit Solaris and Mac OS X are pretty close, 
32-bit Linux has quite a lot of differences, and 32-bit Windows 
somewhat more.

On Thu, Feb 10, 2011 at 10:37:09PM +1100, Graham Williams wrote:
Should one expect minor numerical differences between 64bit and 32bit R on
Windows? Hunting around the lists I've not been able to find a definitive
answer yet. Seems plausible using different precision arithmetic, but waned
to confirm from those who might know for sure.
One of the sources for the difference between platforms are different
settings of the compiler. On Intel processors, the options may influence,
whether the registers use 80 bit or 64 bit representation of floating
point numbers. In memory, it is always 64 bit. Testing, whether there is
a difference between registers and memory may be done for example using
the code

 #include <stdio.h>
 #define n 3
 int main(int agc, char *argv[])
 {
     double x[n];
     int i;
     for (i=0; i<n; i++) {
         x[i] = 1.0/(i + 5);
     }
     for (i=0; i<n; i++) {
         if (x[i] != 1.0/(i + 5)) {
             printf("difference for %d\n", i);
         }
     }
     return 0;
 }

If the compiler uses SSE arithmetic (-mfpmath=sse), then the output is empty.
If Intel's extended arithmetic is used, then we get

 difference for 0
 difference for 1
 difference for 2

On 32 bit Linuxes, the default was Intel's extended arithmetic, while on
64 bit Linuxes, the default is sometimes SSE. I do not know the situation
on Windows.

Another source of difference is different optimization of expressions.

It is sometimes possible to obtain identical results on different platforms,
however, it cannot be generally guaranteed. For tree construction, even
minor differences in rounding may influence comparisons and this may
result in a different form of the tree.

Petr Savicky.

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595