An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mac/attachments/20131202/fbcf1c5e/attachment.pl>
Inaccurate inaccuracies on Mac only?
3 messages · Kasper Daniel Hansen, Simon Urbanek, Peter Dalgaard
On Dec 2, 2013, at 8:35 PM, Kasper Daniel Hansen <kasperdanielhansen at gmail.com> wrote:
I don't think your assumption that different systems will return the same (accurate or inaccurate) answers is correct. I don't even think you can assume that running the same code twice on the same system will give you the same answers, although in practice it often will.
That is true, but it does indeed depend on the code. For example 2L - 1L will always yield the same result, not matter how often you run it. But more to the point, IEEE just guarantees results *assuming* certain precision, but R will use more precision if available for some operations. Now, how much more precision there is available depends on the CPU and architecture, and I suspect that Hans was not sharing enough details. For example, I get the same answer on Ubuntu 12.04 LTS that I get on a Mac contrary to his claims - both with x86_64 architecture (which I suspect is the real difference). And this is not just about R - compilers will often use SIMD instructions instead of FP instructions where it is faster and doesn?t reduce the accuracy - but it may increase it. Cheers, Simon
On Sat, Nov 30, 2013 at 8:54 AM, Hans W Borchers <hwborchers at gmail.com>wrote:
Dear R Mac colleagues, when I am running the following piece of code on Ubuntu Linux 12.04 LTS or on Windows 7 -- with R 3.0.2 --, I get the results as indicated: set.seed(8237) x <- runif(32, -1, 1) y <- runif(32, -1, 1) u <- cbind(x[1], y[1]) u1 <- u %*% t(u) u2 <- apply(u * u, 1, sum) sqrt(u2 + u2 - 2*u1) ## [,1] ## [1,] NaN ## Warning message: ## In sqrt(u2 + u2 - 2 * u1) : NaNs produced u2 + u2 - 2*u1 ## [,1] ## [1,] -2.220446e-16 Theoretically, the last value should be zero. Of course, I am *not* surprised to see this is not the case under finite precision arithmetics. But what did surprise me was that under Mac OS X 10.6 and R 3.0.2 I get the following, exact results: set.seed(8237) x <- runif(32, -1, 1) y <- runif(32, -1, 1) u <- cbind(x[1], y[1]) u1 <- u %*% t(u) u2 <- apply(u * u, 1, sum) sqrt(u2 + u2 - 2*u1) ## [,1] ## [1,] 0 u2 + u2 - 2*u1 ## [,1] ## [1,] 0 I always thought that all these systems and versions comply to the IEEE floating point standard and return the same (accurate or inaccurate) answers. The problem here is: When I test a package on Mac, I cannot be sure it will run without errors on other systems. Actully, I found this when running some examples from help pages, written on the Mac. Was I wrong, or is there something special about the Mac version of R ? Many thanks Hans Werner
_______________________________________________ R-SIG-Mac mailing list R-SIG-Mac at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-mac
[[alternative HTML version deleted]]
_______________________________________________ R-SIG-Mac mailing list R-SIG-Mac at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-mac
On 03 Dec 2013, at 02:48 , Simon Urbanek <simon.urbanek at r-project.org> wrote:
On Dec 2, 2013, at 8:35 PM, Kasper Daniel Hansen <kasperdanielhansen at gmail.com> wrote:
I don't think your assumption that different systems will return the same (accurate or inaccurate) answers is correct. I don't even think you can assume that running the same code twice on the same system will give you the same answers, although in practice it often will.
That is true, but it does indeed depend on the code. For example 2L - 1L will always yield the same result, not matter how often you run it. But more to the point, IEEE just guarantees results *assuming* certain precision, but R will use more precision if available for some operations. Now, how much more precision there is available depends on the CPU and architecture, and I suspect that Hans was not sharing enough details. For example, I get the same answer on Ubuntu 12.04 LTS that I get on a Mac contrary to his claims - both with x86_64 architecture (which I suspect is the real difference). And this is not just about R - compilers will often use SIMD instructions instead of FP instructions where it is faster and doesn?t reduce the accuracy - but it may increase it.
And, although possibly not the case here, optimizing compilers may reorder FP operations in order to keep caches full and pipelines busy. Things like rewriting x1*y1 + x2*y2 + x3*y3 + x4*y4 as (x1*y1 + x2*y2) + (x3*y3 + x4*y4) which is mathematically, but not computationally equivalent. The BLASen are full of that sort of stuff -- if you thought that a matrix multply is just a triple loop, have a look at what goes on in the ATLAS library. Parallel algorithms are even trickier, because you may not have control over the order in which partial results arrive. As a result, programmers have effectively given up the fine control needed to have exactly identical results on different platforms. (In legacy code, we have found at least one case where the original programmer had believed that an algorithm would converge to exact FP equality, but the optimizer had made it not always so, resulting in an infinite loop.)
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com