Skip to content

[Rcpp-devel] Significant difference in performance when computing Ax

2 messages · Søren Højsgaard, Avraham Adler

#
Dear all,
When forming the matrix-vector-product one can form the inner products between As rows and x (mvprod1 below) or form a linear combination of As columns (mvprod2 below). The difference in computing time is striking (at least for me) for large matrices and it really illustrates the "caching issue" (I assume):
Unit: milliseconds
          expr               min            lq                  median         uq          max              neval
mvprod1(A, x) 35.783812 36.310956 36.600186 36.991113 38.922889   100
mvprod2(A, x)   4.408892    4.436882   4.475368   4.568668   5.477176   100
       A %*% x         8.514091    8.592230   8.734978   8.811951   9.738653   100

Just wanted to share this!
Cheers
S?ren



#include <Rcpp.h>
using namespace Rcpp;

//[[Rcpp::export]]
NumericVector mvprod1 (NumericMatrix A, NumericVector x){
  int nrA=A.nrow(), ncA=A.ncol(), i, j;
  NumericVector y(nrA);
  for (i=0; i<nrA; ++i){
    for (j=0; j<ncA; ++j){
      y[i] += A(i,j)*x[j];
    }
  }
  return y;
}

//[[Rcpp::export]]
NumericVector mvprod2 (NumericMatrix A, NumericVector x){
  int nrA=A.nrow(), ncA=A.ncol(), i, j;
  NumericVector y(nrA);
  for (j=0; j<ncA; ++j){
    for (i=0; i<nrA; ++i){
      y[i] += A(i,j)*x[j];
    }
  }
  return y;
}



/*** R
d <- 5
A <- matrix(1.0*(1:d^2),nrow=d)
x <- 1.0*(1:d)

A%*%x
mvprod1(A,x)
mvprod2(A,x)

d <- 2000
A <- matrix(1.0*(1:d^2),nrow=d)
x <- 1.0*(1:d)

library(microbenchmark)
microbenchmark(mvprod1(A,x), mvprod2(A,x), A%*%x)
*/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20140425/932ec89d/attachment.html>
#
There is actually one other command to test, `crossprod`, or in this case
`tcrossprod(x, A)`.

Following your code, first run and ensure TRUE is returned.

d <- 2000
A <- matrix(1.0*(1:d^2),nrow=d)
x <- 1.0*(1:d)
all.equal(A%*%x, tcrossprod(x, A), mvprod1(A,x), mvprod2(A,x))

Now, using an i7-3740QM 2.7Ghz CPU with 8GB of RAM as a testbed on Windows
7 (with R & Rcpp compiled from source throwing "-march=core-avx-i -O3
-std=gnu++0x --param l1-cache-line-size=64 --param l1-cache-size=32 --param
l2-cache-size=256" FWIW) I get the following timings:

microbenchmark(A%*%x, tcrossprod(x, A), mvprod1(A, x), mvprod2(A, x), times
= 1000L, control = list(order = 'block'))


Unit: milliseconds
             expr       min        lq    median        uq       max neval
          A %*% x 13.183992 13.242965 13.269027 13.317917 16.680117  1000
 tcrossprod(x, A)  6.967139  6.991869  7.004424  7.020214  9.795921  1000
    mvprod1(A, x) 43.089952 43.134848 43.163953 43.489824 45.129644  1000
    mvprod2(A, x)  4.564480  4.883693  4.911848  4.928208  7.435495  1000


So the crossprod function is decently optimized as it is, but the prod2 is
still faster.

Thanks for idea and post!

Avi
On Fri, Apr 25, 2014 at 12:47 PM, S?ren H?jsgaard <sorenh at math.aau.dk>wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20140425/dc044337/attachment.html>