Skip to content

Speeding up sum and prod

2 messages · Radford Neal, Simon Urbanek

#
Looking for more ways to speed up R, I've found that large
improvements are possible in the speed of "sum" and "prod" for long
real vectors.  

Here is a little test with R version 2.11.1 on an Intel Linux system
user  system elapsed
  4.800   0.010   4.817
user  system elapsed
  8.240   0.030   8.269

and here is the same with "sum" and "prod" modified as described below:
user  system elapsed
   1.81    0.00    1.81
user  system elapsed
  7.250   0.010   7.259

That's an improvement by a factor of 2.65 for real vectors of length
1000 with na.rm=FALSE (the default), and an improvement of 12% when
na.rm=TRUE.  Of course, the improvement is smaller for very short
vectors.

The biggest reason for the improvement is that the current code (in
2.11.1 and in the development release of 2010-08-19) makes a costly
call of ISNAN even when the option is na.rm=FALSE.  The inner loop
can also be sped up a bit in other respects.

Here is the old procedure, in src/main/summary.c:

static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
{
    LDOUBLE s = 0.0;
    int i;
    Rboolean updated = FALSE;

    for (i = 0; i < n; i++) {
        if (!ISNAN(x[i]) || !narm) {
            if(!updated) updated = TRUE;
            s += x[i];
        }
    }
    *value = s;

    return(updated);
}

and here is my modified version:

static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
{
    LDOUBLE s = 0.0;
    int i;
    Rboolean updated = FALSE;

    if (narm) {
        for (i = 0; i < n; i++) {
            if (!ISNAN(x[i])) {
                s += x[i];
                updated = TRUE;
                break;
            }
        }
        for (i = i+1; i < n; i++) {
            if (!ISNAN(x[i]))
                s += x[i];
        }
    } else {
        for (i = 0; i < n; i++)
            s += x[i];
        if (n>0) updated = TRUE;
    }

    *value = s;

    return(updated);
}

An entirely analogous improvement can be made to the "prod" function.

   Radford Neal
#
On Aug 23, 2010, at 1:19 PM, Radford Neal wrote:

            
The results are likely very compiler- and architecture specific. On my machine [x86_64, OS X 10.6] your version is actually slower for narm (the more you optimize the more assumptions you are making which may turn out to be false):

baseline:
user  system elapsed 
  2.412   0.018   2.431
user  system elapsed 
  2.510   0.001   2.509 

RN version:
user  system elapsed 
  1.691   0.004   1.694
user  system elapsed 
  3.527   0.001   3.528 

If you simply take out the narm check and don't mess with updated you get to a more reasonable:
user  system elapsed 
  1.688   0.003   1.691
user  system elapsed 
  2.522   0.000   2.522 


But just to bring things into perspective -- simply changing the compiler [here just for that one file with still the same optimization level] will give you:
user  system elapsed 
  5.098   0.003   5.102
user  system elapsed 
  5.213   0.000   5.214 

... and using the original (unmodified) R code with a more optimized flags will give you
user  system elapsed 
  1.835   0.003   1.838
user  system elapsed 
  2.473   0.001   2.474 

... and the "optimal" version above (3rd) with the same optimization settings:
user  system elapsed 
  1.670   0.003   1.673
user  system elapsed 
  5.555   0.001   5.556 

... so you just can't win ... 

Cheers,
Simon