Back to formatted view
Raw Message

Message-ID: <4EE2640D.3030402@gmail.com>
Date: 2011-12-09T19:39:57Z
From: Duncan Murdoch
Subject: bug in sum() on integer vector
In-Reply-To: <4EE25637.1020404@fhcrc.org>

On 09/12/2011 1:40 PM, Herv? Pag?s wrote:
> Hi,
>
>     x<- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000))
>
> This is correct:
>
>     >  sum(as.double(x))
>     [1] 0
>
> This is not:
>
>     >  sum(x)
>     [1] 4996000
>
> Returning NA (with a warning) would also be acceptable for the latter.
> That would make it consistent with cumsum(x):
>
>     >  cumsum(x)[length(x)]
>     [1] NA
>     Warning message:
>     Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'

This is a 64 bit problem; in 32 bits things work out properly.   I'd 
guess in 64 bit arithmetic we or the run-time are doing something to 
simulate 32 bit arithmetic (since integers are 32 bits), but it looks as 
though we're not quite getting it right.

Duncan Murdoch

> Thanks!
> H.
>
>   >  sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>    [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
>    [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
>    [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
>    [7] LC_PAPER=C                 LC_NAME=C
>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>