On 11-12-13 6:41 PM, Herv? Pag?s wrote:
Hi Duncan,
On 11-12-10 05:27 AM, Duncan Murdoch wrote:
On 11-12-09 4:41 PM, Herv? Pag?s wrote:
Hi Duncan,
On 11-12-09 11:39 AM, Duncan Murdoch wrote:
On 09/12/2011 1:40 PM, Herv? Pag?s wrote:
Hi,
x<- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000))
This is correct:
[1] 4996000
Returning NA (with a warning) would also be acceptable for the
latter.
That would make it consistent with cumsum(x):
[1] NA
Warning message:
Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'
This is a 64 bit problem; in 32 bits things work out properly.
I'd guess
in 64 bit arithmetic we or the run-time are doing something to
simulate
32 bit arithmetic (since integers are 32 bits), but it looks as though
we're not quite getting it right.
It doesn't work properly for me on Leopard (32-bit mode):
x<- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000))
sum(as.double(x))
R version 2.14.0 RC (2011-10-27 r57452)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
It looks like the problem is that isum() (in src/main/summary.c)
uses a 'double' internally to do the sum, whereas rsum() and csum()
use a 'long double'.
A double has 53 bits to store the mantissa, so any 32 bit integer can be
stored exactly.
Note that isum() seems to be assuming that NA_INTEGER and NA_LOGICAL
will always be the same (probably fine) and that TRUE values in the
input vector are always represented as a 1 (not so sure about this
one).
A more fundamental question: is switching back and forth between
'int' and 'double' (or 'long double') the right thing to do for doing
"safe" arithmetic on integers?
If you have enough terms in the sum that an intermediate value exceeds
53 bits in length, then you'll get the wrong answer, because the
intermediate sum can't be stored exactly. That happens in your example.
On the 32 bit platform I tested (Windows 32 bit), intermediate values
are stored in registers with 64 bit precision, which is probably why
Windows 32 bit gets it right, but various other platforms don't.
On your fundamental question: I think the answer is that R is doing the
right thing. R doesn't think of an integer as a particular
representation, it thinks of it as a number. So if you ask for the sum
of those numbers, R should return its best approximation to that sum,
and it does.
It does, really? Seems like returning 0 would be a better approximation
;-) And with the argument that "R doesn't think of an integer as a
particular representation" then there is no reason why sum(x)
would get it wrong and sum(as.double(x)) would get it right. Also why
bother having an integer type in R?
Seriously, I completely disagree with your view (hopefully it's only
yours, and not an R "feature") that it's ok for integer arithmetic to
return an approximation. It should always return the correct value or
fail.
I think you need to be more specific in your design, because the function
`+` <- function(x,y) stop("fail")