can this happen? - R-devel | R Mailing Lists

Wed, Dec 4, 2002 10:00 AM #

This is basically a question about where to start looking for a problem.

I have a program that gives slightly different results on two Windows
computers.   It is a reasonably complicated numerical optimisation, with
iterative calls to optim().

The two computers both run Windows 2000. On each computer I get the same
results in two different versions of R (1.5.1 and 1.6.0 on one, 1.5.1 and
1.6.1 on the other, the standard binaries), and the results are stable
from run to run on each machine. There's nothing lurking in the workspace.

One computer has a 2GHz Pentium 4 cpu, the other has a 0.75GHz Pentium 3.
I think the problem is with the Pentium 4 machine, since it's giving
occasional errors due to NaNs in internal parts of optim that I don't
understand, but the fault could quite possibly be in my understanding. A
good-quality dual Pentium 4 Linux system doesn't give these internal
errors in optim and seems to give the same results as the Pentium 3
machine (I haven't checked that they are all identical).


	-thomas

Thomas Lumley			Asst. Professor, Biostatistics
tlumley@u.washington.edu	University of Washington, Seattle

Peter Dalgaard

Wed, Dec 4, 2002 10:24 AM #

Thomas Lumley <tlumley@u.washington.edu> writes:

I believe that there's a lot of FP activity inside msvcrt.dll (if I
remember the name correctly) so if that isn't the same between the
machines, it might explain things.

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907

Paul Gilbert

Wed, Dec 4, 2002 11:32 AM #

Thomas Lumley wrote:

...

I don't test much in Windows, but I've had a far amount of trouble like this
with Linux. Not so much with optim(), but with some numerically ill conditioned
problems I get results that are different in the fourth or fifth significant
digit, whereas I typically expect my tests to be better than nine significant
digits, and are often good to fourteen. In Solaris my test values have much
tighter tolerances and were very stable for years, but changed a bit recently
when I switched from svd and eigen to La.svd and La.eigen. The obvious potential
culprit is nonBLAS/BLAS/ATLAS, but the Linux problem does not seem to be related
to that. It is a bit like problems that used to occur when the lower order bits
of doubles did not get zeroed, but the values from run to run on the same
machine are too consistent for a random problem like that.

If you figure out how to track this down, I would like to know. I was going to
try and keep track of the values I get more automatically, but I'm not sure what
information needs to be recorded. OS and R version are obvious, but I suspect
the issue has more to do with math library versions.

Paul Gilbert

Duncan Murdoch

Wed, Dec 4, 2002 12:56 PM #

On Wed, 4 Dec 2002 08:59:17 -0800 (PST), you wrote in message
<Pine.A41.4.44.0212040841570.81760-100000@homer37.u.washington.edu>:

I've recently been trying to track down problems with a couple of
DLLs, and have turned up Windows bugs where common dialogs (file open,
etc) reduce the floating point precision.  The current development
version has code to fix these (everywhere I could think to put it),
but that's not in 1.6.1. 

I'll be putting these changes into 1.6.2 as well, but it's not in
r-patched yet (since I didn't know there was going to be a 1.6.2).

So if you're set up to do a Windows build, you could try compiling
r-devel, and should get consistent results (hopefully matching at
least one of the results you've seen!)

Duncan