Bug or feature? sum(c(a, b, c)) != (a + b + c)
On Wed, Aug 24, 2011 at 11:06 AM, Ted Harding <ted.harding at wlandres.net> wrote:
On 23-Aug-11 22:20:48, Barry Rowlingson wrote:
On Tue, Aug 23, 2011 at 8:17 PM, Daniel Lai <danlai at bccrc.ca> wrote:
Greetings all, I'm porting an algorithm from MATLAB to R, and noticed some minor discrepancies in small decimal values using rowSums and colSums which are exacerbated after heavy iteration and log space transformation. This was rather perplexing as both programs claimed and appeared to use the IEEE 754 standard for floating point arithmetic (confirmed with manual basic operations). _After some tracing and testing, I've managed to isolated a minimal working example as follows: a = 0.812672 b = 0.916541 c = 0.797810 sum(c(a, b, c)) == (a + b + c) [1] FALSE
?Its probably to do with the order of summations. With your a,b,c you get: ?> (a+b+c) == (c+b+a) ?[1] TRUE ?> (a+b+c) == (c+a+b) ?[1] FALSE shock horror, addition is not associative[1]. Lets investigate: ?> sum(c(a,b,c)) == c+a+b ?[1] TRUE ?> sum(c(a,b,c)) == a+c+b ?[1] TRUE ?'sum' seems to get the same answer as adding the first and the third, then adding the second - explicitly: ?> sum(c(a,b,c)) == (a+c)+b ?[1] TRUE I'm not sure what it would do for four values in the sum. Have fun finding out. Does matlab similarly have a+b+c != c+b+a? Barry [1] or commutative or distributive or one of those -ives you learn one day in school. Too lazy to wikipedia it right now...
? 'sum' seems to get the same answer as adding the first and the third, ? then adding the second It seems to be a bit more subtle than that! ?sum(c(a,b,c))==sum(c(a,b,c)) ?# [1] TRUE ?sum(c(a,b,c))==sum(c(a,c,b)) ?# [1] TRUE ?sum(c(a,b,c))==sum(c(b,a,c)) ?# [1] TRUE ?sum(c(a,b,c))==sum(c(b,c,a)) ?# [1] TRUE ?sum(c(a,b,c))==sum(c(c,a,b)) ?# [1] TRUE ?sum(c(a,b,c))==sum(c(c,b,a)) ?# [1] TRUE so that's TRUE for all 6 permutations. However (as before, but all 6 this time): ?a+b+c == a+b+c ?# [1] TRUE ?a+b+c == a+c+b ?# [1] FALSE ?a+b+c == b+c+a ?# [1] TRUE ?a+b+c == b+a+c ?# [1] TRUE ?a+b+c == c+a+b ?# [1] FALSE ?a+b+c == c+b+a ?# [1] TRUE So it looks as though sum(c(...)) is somwhow independent of the order of its arguments, which implies that the summation it does is not quite the addition corresponding to "+". I now feel out of my depth, so hope someone else knows/can find the truth about this!
sum() uses a long double accumulator.
From src/main/summary.c
static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
{
long double s = 0.0;
int i;
Rboolean updated = FALSE;
for (i = 0; i < n; i++) {
if (!narm || !ISNAN(x[i])) {
if(!updated) updated = TRUE;
s += x[i];
}
}
*value = s;
return(updated);
}
Thomas Lumley Professor of Biostatistics University of Auckland