cumsum vs. sum

"SM" == Stavros Macrakis <macrakis at alum.mit.edu>
    on Wed, 18 Feb 2009 10:00:40 -0500 writes:
SM> Nice!  Glad to hear it. It sounds as though it is still possible for
    SM> cumsum(x)[length(x)] to not be exactly equal to sum, though?

Well, possible, probably yes, platform-dependently; 
However I vaguely remember that I didn't see one such case in the few
experiments I did.

Martin

    SM> On Wed, Feb 18, 2009 at 8:03 AM, Martin Maechler
SM> ...
    >> o   cumsum(x) and cumprod(x) for double precision x now use a long
    >> double accumulator where available and so more closely match
    >> sum() and prod() in potentially being more accurate.
Hmm.  Why not use the same method to guarantee the same result?  Or at
least document the possibility that cumsum(x)[length(x)] != sum(x)...
that seems like an easy trap to fall into.

          -s

On Wed, Feb 18, 2009 at 11:39 AM, Martin Maechler
"SM" == Stavros Macrakis <macrakis at alum.mit.edu>
    on Wed, 18 Feb 2009 10:00:40 -0500 writes:
   SM> Nice!  Glad to hear it. It sounds as though it is still possible for
   SM> cumsum(x)[length(x)] to not be exactly equal to sum, though?

Well, possible, probably yes, platform-dependently;
However I vaguely remember that I didn't see one such case in the few
experiments I did.

Martin

   SM> On Wed, Feb 18, 2009 at 8:03 AM, Martin Maechler
   SM> <maechler at stat.math.ethz.ch> wrote:
   SM> ...
   >> o   cumsum(x) and cumprod(x) for double precision x now use a long
   >> double accumulator where available and so more closely match
   >> sum() and prod() in potentially being more accurate.

Hmm.  Why not use the same method to guarantee the same result?  Or at
least document the possibility that cumsum(x)[length(x)] != sum(x)...
that seems like an easy trap to fall into.
Assuming equality of floating point numbers computed by two different 
paths is always a trap.

R doesn't try to obtain results that are equal to the last bit in other 
circumstances; why should it do so here?  For example, one somewhat 
controversial choice in R is to use 64 bit precision in intermediate 
computations when available, rather than rounding everything to 52 bits 
as it does when stored to memory in doubles.  This means that the value 
you get is likely to be closer to the truth than if you did the rounding 
earlier, but it is also subject to change according to optimization 
level, compiler version, etc.

Duncan Murdoch
          -s

On Wed, Feb 18, 2009 at 11:39 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
"SM" == Stavros Macrakis <macrakis at alum.mit.edu>
    on Wed, 18 Feb 2009 10:00:40 -0500 writes:
   SM> Nice!  Glad to hear it. It sounds as though it is still possible for
   SM> cumsum(x)[length(x)] to not be exactly equal to sum, though?

Well, possible, probably yes, platform-dependently;
However I vaguely remember that I didn't see one such case in the few
experiments I did.

Martin

   SM> On Wed, Feb 18, 2009 at 8:03 AM, Martin Maechler
   SM> <maechler at stat.math.ethz.ch> wrote:
   SM> ...
   >> o   cumsum(x) and cumprod(x) for double precision x now use a long
   >> double accumulator where available and so more closely match
   >> sum() and prod() in potentially being more accurate.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
G'day all,

On Wed, 18 Feb 2009 12:41:27 -0500

Hmm.  Why not use the same method to guarantee the same result?  
Hmm, I did not look at the source code but, potentially, sum() could
use some tricks to reduce rounding errors further that would not be
available to cumsum(); e.g. sorting the data before adding summing
them; or grouping them into groups of roughly similar magnitude and
then sum group-by-group.  So it does may be counter-productive to use
the same method.....
Or at least document the possibility that cumsum(x)[length(x)] !=
sum(x)... that seems like an easy trap to fall into.
But this is already documented, isn't it?  FAQ 7.31. ;-))

Cheers,

	Berwin
Duncan, Berwin, Martin,

Thanks for your thoughtful explanations, which make perfect sense.

May I simply suggest that the non-identity between last(cumsum) and
sum might be worth mentioning in the cumsum doc page?

            -s