Speeding up sum and prod

2 messages · Radford Neal, Simon Urbanek

Original

Radford Neal

Mon, Aug 23, 2010 10:19 AM #

Looking for more ways to speed up R, I've found that large
improvements are possible in the speed of "sum" and "prod" for long
real vectors.  

Here is a little test with R version 2.11.1 on an Intel Linux system

user  system elapsed
  4.800   0.010   4.817

user  system elapsed
  8.240   0.030   8.269

and here is the same with "sum" and "prod" modified as described below:

user  system elapsed
   1.81    0.00    1.81

user  system elapsed
  7.250   0.010   7.259

That's an improvement by a factor of 2.65 for real vectors of length
1000 with na.rm=FALSE (the default), and an improvement of 12% when
na.rm=TRUE.  Of course, the improvement is smaller for very short
vectors.

The biggest reason for the improvement is that the current code (in
2.11.1 and in the development release of 2010-08-19) makes a costly
call of ISNAN even when the option is na.rm=FALSE.  The inner loop
can also be sped up a bit in other respects.

Here is the old procedure, in src/main/summary.c:

static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
{
    LDOUBLE s = 0.0;
    int i;
    Rboolean updated = FALSE;

    for (i = 0; i < n; i++) {
        if (!ISNAN(x[i]) || !narm) {
            if(!updated) updated = TRUE;
            s += x[i];
        }
    }
    *value = s;

    return(updated);
}

and here is my modified version:

static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
{
    LDOUBLE s = 0.0;
    int i;
    Rboolean updated = FALSE;

    if (narm) {
        for (i = 0; i < n; i++) {
            if (!ISNAN(x[i])) {
                s += x[i];
                updated = TRUE;
                break;
            }
        }
        for (i = i+1; i < n; i++) {
            if (!ISNAN(x[i]))
                s += x[i];
        }
    } else {
        for (i = 0; i < n; i++)
            s += x[i];
        if (n>0) updated = TRUE;
    }

    *value = s;

    return(updated);
}

An entirely analogous improvement can be made to the "prod" function.

   Radford Neal

Mon, Aug 23, 2010 11:54 AM #

On Aug 23, 2010, at 1:19 PM, Radford Neal wrote:

The results are likely very compiler- and architecture specific. On my machine [x86_64, OS X 10.6] your version is actually slower for narm (the more you optimize the more assumptions you are making which may turn out to be false):

baseline:

user  system elapsed 
  2.412   0.018   2.431

user  system elapsed 
  2.510   0.001   2.509 

RN version:

user  system elapsed 
  1.691   0.004   1.694

user  system elapsed 
  3.527   0.001   3.528 

If you simply take out the narm check and don't mess with updated you get to a more reasonable:

user  system elapsed 
  1.688   0.003   1.691

user  system elapsed 
  2.522   0.000   2.522 


But just to bring things into perspective -- simply changing the compiler [here just for that one file with still the same optimization level] will give you:

user  system elapsed 
  5.098   0.003   5.102

user  system elapsed 
  5.213   0.000   5.214 

... and using the original (unmodified) R code with a more optimized flags will give you

user  system elapsed 
  1.835   0.003   1.838

user  system elapsed 
  2.473   0.001   2.474 

... and the "optimal" version above (3rd) with the same optimization settings:

user  system elapsed 
  1.670   0.003   1.673

user  system elapsed 
  5.555   0.001   5.556 

... so you just can't win ... 

Cheers,
Simon

Here is a little test with R version 2.11.1 on an Intel Linux system

a <- seq(0,1,length=1000)
system.time({for (i in 1:1000000) b <- sum(a)})

  user  system elapsed
 4.800   0.010   4.817

system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})

  user  system elapsed
 8.240   0.030   8.269

and here is the same with "sum" and "prod" modified as described below:

a <- seq(0,1,length=1000)
system.time({for (i in 1:1000000) b <- sum(a)})

  user  system elapsed
  1.81    0.00    1.81

system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})

  user  system elapsed
 7.250   0.010   7.259

That's an improvement by a factor of 2.65 for real vectors of length
1000 with na.rm=FALSE (the default), and an improvement of 12% when
na.rm=TRUE.  Of course, the improvement is smaller for very short
vectors.

The biggest reason for the improvement is that the current code (in
2.11.1 and in the development release of 2010-08-19) makes a costly
call of ISNAN even when the option is na.rm=FALSE.  The inner loop
can also be sped up a bit in other respects.

Here is the old procedure, in src/main/summary.c:

static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
{
   LDOUBLE s = 0.0;
   int i;
   Rboolean updated = FALSE;

   for (i = 0; i < n; i++) {
       if (!ISNAN(x[i]) || !narm) {
           if(!updated) updated = TRUE;
           s += x[i];
       }
   }
   *value = s;

   return(updated);
}

and here is my modified version:

static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
{
   LDOUBLE s = 0.0;
   int i;
   Rboolean updated = FALSE;

   if (narm) {
       for (i = 0; i < n; i++) {
           if (!ISNAN(x[i])) {
               s += x[i];
               updated = TRUE;
               break;
           }
       }
       for (i = i+1; i < n; i++) {
           if (!ISNAN(x[i]))
               s += x[i];
       }
   } else {
       for (i = 0; i < n; i++)
           s += x[i];
       if (n>0) updated = TRUE;
   }

   *value = s;

   return(updated);
}

An entirely analogous improvement can be made to the "prod" function.

  Radford Neal

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel