bug in sum() on integer vector

Duncan Murdoch · 2011-12-10T13:27:42Z

On 11-12-09 4:41 PM, Herv? Pag?s wrote: > Hi Duncan, > > On 11-12-09 11:39 AM, Duncan Murdoch wrote: >> On 09/12/2011 1:40 PM, Herv? Pag?s wrote: >>> Hi, >>> >>> x >> >>> This is correct: >>> >>>> sum(as.double(x)) >>> [1] 0 >>> >>> This is not: >>> >>>> sum(x) >>> [1] 4996000 >>> >>> Returning NA (with a warning) would also be acceptable for the latter. >>> That would make it consistent with cumsum(x): >>> >>>> cumsum(x)[length(x)] >

Duncan Murdoch

Sat, Dec 10, 2011 5:27 AM

On 11-12-09 4:41 PM, Herv? Pag?s wrote:

A double has 53 bits to store the mantissa, so any 32 bit integer can be 
stored exactly.

If you have enough terms in the sum that an intermediate value exceeds 
53 bits in length, then you'll get the wrong answer, because the 
intermediate sum can't be stored exactly.  That happens in your example. 
On the 32 bit platform I tested (Windows 32 bit), intermediate values 
are stored in registers with 64 bit precision, which is probably why 
Windows 32 bit gets it right, but various other platforms don't.

On your fundamental question:  I think the answer is that R is doing the 
right thing.  R doesn't think of an integer as a particular 
representation, it thinks of it as a number.  So if you ask for the sum 
of those numbers, R should return its best approximation to that sum, 
and it does.

A different approach would be to do the sum in 32 bit registers and 
detect 32 bit overflow in intermediate results.  But that's a very 
hardware-oriented approach, rather than a mathematical approach.

Duncan Murdoch

bug in sum() on integer vector

Thread (10 messages)