[Rcpp-devel] How to increase the coding efficiency

Can you post a minimal full example?
-Christian

On Tue, Dec 4, 2012 at 8:39 PM, Honglang Wang
Yes, the main issue for my coding is the allocation of memory.  And I have
fixed one of the biggest memory allocation issue: 4000 by 4000 diagonal
matrix. And since I am not familiar with Rcpp and RcppArmadillo, I have no
idea how to reuse the memory. I hope I can have some materials to learn
this. Thanks.

What exactly do these timings show?  A single call you your function?
How many calls?

Here I called my function for 100 times.

Building on Romain's point: -- a portion of your function's runtime is
in memory allocation
(and you have a lot of allocations here).
If you're calling your function thousands or millions of times, then
it might pay to closely
examine your memory allocation strategies and figure out what's
temporary, for example.
It looks like you're already using  copy_aux_mem = false in a number
of places, but you're
allocating a lot of objects -- of approx what size?

For example, wouldn't this work just as well with one less allocation?
arma::vec kk = t;
arma::uvec q1 = arma::find(arma::abs(tp)<h);
kk.elem(q1) = ((1-arma::pow(tp.elem(q1)/h,2))/h)*0.75;
// done with q1.  let's reuse it.
q1 = arma::find(arma::abs(tp)>=h);
// was q2
kk.elem(q1).zeros();

You could potentially allocate memory for temporary working space in
R, grab it with copy_aux_mem = false, write your temp results there,
and reuse these objects in subsequent function calls.  It doesn't make
sense to go to this trouble, though, if your core algorithm consumes
the bulk of runtime.

Have you looked on the armadillo notes r.e. inv?  Matrix inversion has
O(>n^2).  You may be aided by pencil-and-paper math here.
http://arma.sourceforge.net/docs.html#inv

Here my matrix for inverse is only 4 by 4, so I think it's ok.

best,
Christian

Dear All,
I have tried out the first example by using RcppArmadillo, but I am not
sure whether the code is efficient or not. And I did the comparison of
the
computation time.

1) R code using for loop in R: 87.22s
2) R code using apply: 77.86s
3) RcppArmadillo by using for loop in C++: 53.102s
4) RcppArmadillo together with apply in R: 47.310s

It is kind of not so big increase. I am wondering whether I used an
inefficient way for the C++ coding:

--
A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal ? Panama!

A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal ? Panama!
The following is a full example although I don't know whether it's minimal
or not:

library(Rcpp)
library(RcppArmadillo)
sourceCpp("betahat_mod.cpp")

#The following is data generation.
  n=200
  m=20
  p=2
  t=runif(m*n,min=0, max=1)
  X1=rnorm(m*n,0,1)
  X1=as.matrix(1+2*exp(t)+X1)
  X2=rnorm(m*n,0,1)
  X2=as.matrix(3-4*t^2+X2)
  X=cbind(X1,X2)
  beta1=0.5*sin(t)
  beta2=beta1
  rho=0.2
  sig2=1

eps=unlist(lapply(1:n,function(x){as.matrix(arima.sim(list(ar=rho),m,sd=sqrt(sig2*(1-rho^2))))}))
  y=as.matrix(X1*beta1+X2*beta2+eps)

#A simple kernel function.
ker=function(x,h)
  {
    ans=x
    lo=(abs(x)<h)
    ans[lo]=(3/4)*(1-(x[lo]/h)^2)/h
    ans[!lo]=0
    return(ans)
  }

h=0.3

#assess the time for evaluating bethat of 100 t's:
system.time((betahatt=t(apply(as.matrix(t[1:100]),1,function(x)
betahat(ker,x,X,y,t,h,m)$betahat))))

And the .cpp file is the following:

// [[Rcpp::depends(RcppArmadillo)]]

#include <RcppArmadillo.h>

using namespace Rcpp;

// [[Rcpp::export]]
List betahat(Function ker, double t0, NumericMatrix Xr, NumericMatrix yr,
NumericVector tr, double h, int m) {
  int n = Xr.nrow(), p = Xr.ncol();
  arma::mat X(Xr.begin(), n, p, false);
  arma::mat y(yr.begin(), n, 1, false);
  arma::colvec t(tr.begin(), tr.size(), false);
  arma::mat T = X;
  T.each_col() %= (t-t0)/h;
  arma::mat D = arma::join_rows(X,T);
  arma::vec kk =as<arma::vec>(ker(tr-t0,h));
  arma::mat W = (arma::diagmat(kk))/m;
  arma::mat Inv = arma::trans(D)*W*D;
  arma::vec betahat = arma::inv(Inv)*arma::trans(D)*W*y;
  arma::colvec betahat0(betahat.begin(),betahat.size()/2,false);
  return List::create(Named("betahat") = betahat0);
}

Best wishes!

Honglang Wang

Office C402 Wells Hall
Department of Statistics and Probability
Michigan State University
1579 I Spartan Village, East Lansing, MI 48823
wangho16 at msu.edu

Can you post a minimal full example?
-Christian

On Tue, Dec 4, 2012 at 8:39 PM, Honglang Wang
<wanghonglang2008 at gmail.com> wrote:
Yes, the main issue for my coding is the allocation of memory.  And I
have
fixed one of the biggest memory allocation issue: 4000 by 4000 diagonal
matrix. And since I am not familiar with Rcpp and RcppArmadillo, I have
no
idea how to reuse the memory. I hope I can have some materials to learn
this. Thanks.

What exactly do these timings show?  A single call you your function?
How many calls?

Here I called my function for 100 times.

Building on Romain's point: -- a portion of your function's runtime is
in memory allocation
(and you have a lot of allocations here).
If you're calling your function thousands or millions of times, then
it might pay to closely
examine your memory allocation strategies and figure out what's
temporary, for example.
It looks like you're already using  copy_aux_mem = false in a number
of places, but you're
allocating a lot of objects -- of approx what size?

For example, wouldn't this work just as well with one less allocation?
arma::vec kk = t;
arma::uvec q1 = arma::find(arma::abs(tp)<h);
kk.elem(q1) = ((1-arma::pow(tp.elem(q1)/h,2))/h)*0.75;
// done with q1.  let's reuse it.
q1 = arma::find(arma::abs(tp)>=h);
// was q2
kk.elem(q1).zeros();

You could potentially allocate memory for temporary working space in
R, grab it with copy_aux_mem = false, write your temp results there,
and reuse these objects in subsequent function calls.  It doesn't make
sense to go to this trouble, though, if your core algorithm consumes
the bulk of runtime.

Have you looked on the armadillo notes r.e. inv?  Matrix inversion has
O(>n^2).  You may be aided by pencil-and-paper math here.
http://arma.sourceforge.net/docs.html#inv

Here my matrix for inverse is only 4 by 4, so I think it's ok.

best,
Christian

Dear All,
I have tried out the first example by using RcppArmadillo, but I am
not
sure whether the code is efficient or not. And I did the comparison of
the
computation time.

1) R code using for loop in R: 87.22s
2) R code using apply: 77.86s
3) RcppArmadillo by using for loop in C++: 53.102s
4) RcppArmadillo together with apply in R: 47.310s

It is kind of not so big increase. I am wondering whether I used an
inefficient way for the C++ coding:

--
A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal ? Panama!

--
A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal ? Panama!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121205/c36a832e/attachment.html>
this part will always make your code crawl along:

On Wed, Dec 5, 2012 at 11:09 AM, Honglang Wang

  arma::vec betahat = arma::inv(Inv)*arma::trans(D)*W*y;

first time I wrote a GLM engine, I wrote it the way statistics books
illustrate it (i.e. actually invert things to do IWLS) and it crawled.  I
took it to a physicist friend who went through alternate stages of disgust
and laughter then told me never to invert anything I didn't have to.

You should take Doug Bates' advice, it could save you a great deal of time.
 Never bit fiddle when you can use a better algorithm.
*A model is a lie that helps you see the truth.*
*
*
Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121205/680ac712/attachment.html>
this part will always make your code crawl along:
  arma::vec betahat = arma::inv(Inv)*arma::trans(D)*W*y;
I'd be very interested to see the before/after version of your code
code, Honglang. With timings. It'll make a good blog post or academic
paper (depending on just how many learnings you get from the process) :-)

Darren
Darren Cook, Software Researcher/Developer

http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
I don't know as much about Armadillo as I do about Eigen so I will cheat
and write using RcppEigen instead of RcppArmadillo.  Page 6 of the Eigen
tutorial at http://eigen.tuxfamily.org/dox/ discusses decompositions and
solving linear systems of equations.  One of the simplest ways of solving a
weighted least squares system, if you only want the solution itself, is
with a QR decomposition of which there are 3 in Eigen.  The simplest is
HouseholderQR (named after A.S. Householder who first described an
elementary reflector).  For a least squares solution (assuming the model
matrix has full column rank) you simply use the solve method. For a
weighted least squares solution you premultiply both the model matrix and
the response by the square roots of the weights.

It can be collapsed to a one-liner, as in the enclosed.  A test that it
works for unweighted least squares (i.e. using unit weights) is
sourceCpp("/tmp/wtls.cpp")
coef(lm(optden ~ carb, Formaldehyde))(Intercept)        carb
0.005085714 0.876285714
with(Formaldehyde, wtls(model.matrix(~carb), optden, rep.int(1,
length(optden))))
$betahat
[1] 0.005085714 0.876285714

The use of the other decompositions (Cholesky, ColPivHouseholderQR,
Symmetric Eigendecomposition, SVD) are described in the RcppEigen vignette.

By the way, the sourceCpp, evalCpp and cppFunction capabilities developed
by the Rcpp authors is very close to magic.  They are to be congratulated
on a remarkable accomplishment.

this part will always make your code crawl along:

On Wed, Dec 5, 2012 at 11:09 AM, Honglang Wang <wanghonglang2008 at gmail.com
wrote:

  arma::vec betahat = arma::inv(Inv)*arma::trans(D)*W*y;

first time I wrote a GLM engine, I wrote it the way statistics books
illustrate it (i.e. actually invert things to do IWLS) and it crawled.  I
took it to a physicist friend who went through alternate stages of disgust
and laughter then told me never to invert anything I didn't have to.

You should take Doug Bates' advice, it could save you a great deal of
time.  Never bit fiddle when you can use a better algorithm.

--
*A model is a lie that helps you see the truth.*
*
*
Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>

_______________________________________________
Rcpp-devel mailing list
Rcpp-devel at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121206/e6b3c442/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wtls.cpp
Type: text/x-c++src
Size: 494 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121206/e6b3c442/attachment.cpp>
| // [[Rcpp::depends(RcppEigen)]]
| #include <RcppEigen.h>
| 
| typedef Eigen::MatrixXd                 Mat;
| typedef Eigen::Map<Mat>                MMat;
| typedef Eigen::HouseholderQR<Mat>        QR;
| typedef Eigen::VectorXd                 Vec;
| typedef Eigen::Map<Vec>                MVec;
| 
| // [[Rcpp::export]]
| 
| Rcpp::List wtls(const MMat X, const MVec y, const MVec sqrtwts) {
|     return Rcpp::List::create(Rcpp::Named("betahat") =
| 			      QR(sqrtwts.asDiagonal()*X).solve(sqrtwts.asDiagonal()*y));
| }

That is a thing of beauty.  Nicely done, Doug!

Dirk
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
Thanks Douglas, I will study your code and Eigen carefully.

Best wishes!

Honglang Wang

Office C402 Wells Hall
Department of Statistics and Probability
Michigan State University
1579 I Spartan Village, East Lansing, MI 48823
wangho16 at msu.edu

I don't know as much about Armadillo as I do about Eigen so I will cheat
and write using RcppEigen instead of RcppArmadillo.  Page 6 of the Eigen
tutorial at http://eigen.tuxfamily.org/dox/ discusses decompositions and
solving linear systems of equations.  One of the simplest ways of solving a
weighted least squares system, if you only want the solution itself, is
with a QR decomposition of which there are 3 in Eigen.  The simplest is
HouseholderQR (named after A.S. Householder who first described an
elementary reflector).  For a least squares solution (assuming the model
matrix has full column rank) you simply use the solve method. For a
weighted least squares solution you premultiply both the model matrix and
the response by the square roots of the weights.

It can be collapsed to a one-liner, as in the enclosed.  A test that it
works for unweighted least squares (i.e. using unit weights) is

sourceCpp("/tmp/wtls.cpp")
coef(lm(optden ~ carb, Formaldehyde))(Intercept)        carb
0.005085714 0.876285714
with(Formaldehyde, wtls(model.matrix(~carb), optden, rep.int(1,
length(optden))))
$betahat
[1] 0.005085714 0.876285714

The use of the other decompositions (Cholesky, ColPivHouseholderQR,
Symmetric Eigendecomposition, SVD) are described in the RcppEigen vignette.

By the way, the sourceCpp, evalCpp and cppFunction capabilities developed
by the Rcpp authors is very close to magic.  They are to be congratulated
on a remarkable accomplishment.

On Wed, Dec 5, 2012 at 1:16 PM, Tim Triche, Jr. <tim.triche at gmail.com>wrote:

this part will always make your code crawl along:

On Wed, Dec 5, 2012 at 11:09 AM, Honglang Wang <
wanghonglang2008 at gmail.com> wrote:

  arma::vec betahat = arma::inv(Inv)*arma::trans(D)*W*y;

first time I wrote a GLM engine, I wrote it the way statistics books
illustrate it (i.e. actually invert things to do IWLS) and it crawled.  I
took it to a physicist friend who went through alternate stages of disgust
and laughter then told me never to invert anything I didn't have to.

You should take Doug Bates' advice, it could save you a great deal of
time.  Never bit fiddle when you can use a better algorithm.

--
*A model is a lie that helps you see the truth.*
*
*
Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>

_______________________________________________
Rcpp-devel mailing list
Rcpp-devel at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121206/78258caa/attachment-0001.html>
Hi Honglang,

I recommend looking into using the solve() function in Armadillo, instead
of inv(). Using solve() will be considerably faster.

More details here:
http://arma.sourceforge.net/docs.html#solve

On Thursday, December 6, 2012, Honglang Wang <wanghonglang2008 at gmail.com>
wrote:
The following is a full example although I don't know whether it's
minimal or not:
library(Rcpp)
library(RcppArmadillo)
sourceCpp("betahat_mod.cpp")

#The following is data generation.
  n=200
  m=20
  p=2
  t=runif(m*n,min=0, max=1)
  X1=rnorm(m*n,0,1)
  X1=as.matrix(1+2*exp(t)+X1)
  X2=rnorm(m*n,0,1)
  X2=as.matrix(3-4*t^2+X2)
  X=cbind(X1,X2)
  beta1=0.5*sin(t)
  beta2=beta1
  rho=0.2
  sig2=1

eps=unlist(lapply(1:n,function(x){as.matrix(arima.sim(list(ar=rho),m,sd=sqrt(sig2*(1-rho^2))))}))
  y=as.matrix(X1*beta1+X2*beta2+eps)

#A simple kernel function.
ker=function(x,h)
  {
    ans=x
    lo=(abs(x)<h)
    ans[lo]=(3/4)*(1-(x[lo]/h)^2)/h
    ans[!lo]=0
    return(ans)
  }

h=0.3

#assess the time for evaluating bethat of 100 t's:
system.time((betahatt=t(apply(as.matrix(t[1:100]),1,function(x)
betahat(ker,x,X,y,t,h,m)$betahat))))
And the .cpp file is the following:

// [[Rcpp::depends(RcppArmadillo)]]

#include <RcppArmadillo.h>

using namespace Rcpp;

// [[Rcpp::export]]
List betahat(Function ker, double t0, NumericMatrix Xr, NumericMatrix yr,
NumericVector tr, double h, int m) {
  int n = Xr.nrow(), p = Xr.ncol();
  arma::mat X(Xr.begin(), n, p, false);
  arma::mat y(yr.begin(), n, 1, false);
  arma::colvec t(tr.begin(), tr.size(), false);
  arma::mat T = X;
  T.each_col() %= (t-t0)/h;
  arma::mat D = arma::join_rows(X,T);
  arma::vec kk =as<arma::vec>(ker(tr-t0,h));
  arma::mat W = (arma::diagmat(kk))/m;
  arma::mat Inv = arma::trans(D)*W*D;
  arma::vec betahat = arma::inv(Inv)*arma::trans(D)*W*y;
  arma::colvec betahat0(betahat.begin(),betahat.size()/2,false);
  return List::create(Named("betahat") = betahat0);
}

Best wishes!

Honglang Wang

Office C402 Wells Hall
Department of Statistics and Probability
Michigan State University
1579 I Spartan Village, East Lansing, MI 48823
wangho16 at msu.edu

On Wed, Dec 5, 2012 at 4:07 AM, Christian Gunning <xian at unm.edu> wrote:
Can you post a minimal full example?
-Christian

On Tue, Dec 4, 2012 at 8:39 PM, Honglang Wang
<wanghonglang2008 at gmail.com> wrote:
Yes, the main issue for my coding is the allocation of memory.  And I
have
fixed one of the biggest memory allocation issue: 4000 by 4000 diagonal
matrix. And since I am not familiar with Rcpp and RcppArmadillo, I
have no
idea how to reuse the memory. I hope I can have some materials to learn
this. Thanks.

What exactly do these timings show?  A single call you your function?
How many calls?

Here I called my function for 100 times.

Building on Romain's point: -- a portion of your function's runtime is
in memory allocation
(and you have a lot of allocations here).
If you're calling your function thousands or millions of times, then
it might pay to closely
examine your memory allocation strategies and figure out what's
temporary, for example.
It looks like you're already using  copy_aux_mem = false in a number
of places, but you're
allocating a lot of objects -- of approx what size?

For example, wouldn't this work just as well with one less allocation?
arma::vec kk = t;
arma::uvec q1 = arma::find(arma::abs(tp)<h);
kk.elem(q1) = ((1-arma::pow(tp.elem(q1)/h,2))/h)*0.75;
// done with q1.  let's reuse it.
q1 = arma::find(arma::abs(tp)>=h);
// was q2
kk.elem(q1).zeros();

You could potentially allocate memory for temporary working space in
R, grab it with copy_aux_mem = false, write your temp results there,
and reuse these objects in subsequent function calls.  It doesn't make
sense to go to this trouble, though, if your core algorithm consumes
the bulk of runtime.

Have you looked on the armadillo notes r.e. inv?  Matrix inversion has
O(>n^2).  You may be aided by pencil-and-paper math here.
http://arma.sourceforge.net/docs.html#inv

Here my matrix for inverse is only 4 by 4, so I think it's ok.

best,
Christian

Dear All,
I have tried out the first example by using RcppArmadillo, but I am
not
sure whether the code is efficient or not. And I did the comparison
of
the
computation time.

1) R code using for loop in R: 87.22s
2) R code using apply: 77.86s
3) RcppArmadillo by using for loop in C++: 53.102s
4) RcppArmadillo together with apply in R: 47.310s

It is kind of not so big increase. I am wondering whether I used an
inefficient way for the C++ coding:

--
A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal ? Panama!

--
A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal ? Panama!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121210/124ff8b5/attachment.html>
I tried out that if the dim of the matrix less than or equal to 4, inv()
and solve() have the same speed.

Best wishes!

Honglang Wang

Office C402 Wells Hall
Department of Statistics and Probability
Michigan State University
1579 I Spartan Village, East Lansing, MI 48823
wangho16 at msu.edu

Hi Honglang,

I recommend looking into using the solve() function in Armadillo, instead
of inv(). Using solve() will be considerably faster.

More details here:
http://arma.sourceforge.net/docs.html#solve

On Thursday, December 6, 2012, Honglang Wang <wanghonglang2008 at gmail.com>
wrote:
The following is a full example although I don't know whether it's
minimal or not:
library(Rcpp)
library(RcppArmadillo)
sourceCpp("betahat_mod.cpp")

#The following is data generation.
  n=200
  m=20
  p=2
  t=runif(m*n,min=0, max=1)
  X1=rnorm(m*n,0,1)
  X1=as.matrix(1+2*exp(t)+X1)
  X2=rnorm(m*n,0,1)
  X2=as.matrix(3-4*t^2+X2)
  X=cbind(X1,X2)
  beta1=0.5*sin(t)
  beta2=beta1
  rho=0.2
  sig2=1

eps=unlist(lapply(1:n,function(x){as.matrix(arima.sim(list(ar=rho),m,sd=sqrt(sig2*(1-rho^2))))}))
  y=as.matrix(X1*beta1+X2*beta2+eps)

#A simple kernel function.
ker=function(x,h)
  {
    ans=x
    lo=(abs(x)<h)
    ans[lo]=(3/4)*(1-(x[lo]/h)^2)/h
    ans[!lo]=0
    return(ans)
  }

h=0.3

#assess the time for evaluating bethat of 100 t's:
system.time((betahatt=t(apply(as.matrix(t[1:100]),1,function(x)
betahat(ker,x,X,y,t,h,m)$betahat))))
And the .cpp file is the following:

// [[Rcpp::depends(RcppArmadillo)]]

#include <RcppArmadillo.h>

using namespace Rcpp;

// [[Rcpp::export]]
List betahat(Function ker, double t0, NumericMatrix Xr, NumericMatrix
yr, NumericVector tr, double h, int m) {
  int n = Xr.nrow(), p = Xr.ncol();
  arma::mat X(Xr.begin(), n, p, false);
  arma::mat y(yr.begin(), n, 1, false);
  arma::colvec t(tr.begin(), tr.size(), false);
  arma::mat T = X;
  T.each_col() %= (t-t0)/h;
  arma::mat D = arma::join_rows(X,T);
  arma::vec kk =as<arma::vec>(ker(tr-t0,h));
  arma::mat W = (arma::diagmat(kk))/m;
  arma::mat Inv = arma::trans(D)*W*D;
  arma::vec betahat = arma::inv(Inv)*arma::trans(D)*W*y;
  arma::colvec betahat0(betahat.begin(),betahat.size()/2,false);
  return List::create(Named("betahat") = betahat0);
}

Best wishes!

Honglang Wang

Office C402 Wells Hall
Department of Statistics and Probability
Michigan State University
1579 I Spartan Village, East Lansing, MI 48823
wangho16 at msu.edu

On Wed, Dec 5, 2012 at 4:07 AM, Christian Gunning <xian at unm.edu> wrote:
Can you post a minimal full example?
-Christian

On Tue, Dec 4, 2012 at 8:39 PM, Honglang Wang
<wanghonglang2008 at gmail.com> wrote:
Yes, the main issue for my coding is the allocation of memory.  And I
have
fixed one of the biggest memory allocation issue: 4000 by 4000
diagonal
matrix. And since I am not familiar with Rcpp and RcppArmadillo, I
have no
idea how to reuse the memory. I hope I can have some materials to
learn
this. Thanks.

What exactly do these timings show?  A single call you your function?
How many calls?

Here I called my function for 100 times.

Building on Romain's point: -- a portion of your function's runtime
is
in memory allocation
(and you have a lot of allocations here).
If you're calling your function thousands or millions of times, then
it might pay to closely
examine your memory allocation strategies and figure out what's
temporary, for example.
It looks like you're already using  copy_aux_mem = false in a number
of places, but you're
allocating a lot of objects -- of approx what size?

For example, wouldn't this work just as well with one less
allocation?
arma::vec kk = t;
arma::uvec q1 = arma::find(arma::abs(tp)<h);
kk.elem(q1) = ((1-arma::pow(tp.elem(q1)/h,2))/h)*0.75;
// done with q1.  let's reuse it.
q1 = arma::find(arma::abs(tp)>=h);
// was q2
kk.elem(q1).zeros();

You could potentially allocate memory for temporary working space in
R, grab it with copy_aux_mem = false, write your temp results there,
and reuse these objects in subsequent function calls.  It doesn't
make
sense to go to this trouble, though, if your core algorithm consumes
the bulk of runtime.

Have you looked on the armadillo notes r.e. inv?  Matrix inversion
has
O(>n^2).  You may be aided by pencil-and-paper math here.
http://arma.sourceforge.net/docs.html#inv

Here my matrix for inverse is only 4 by 4, so I think it's ok.

best,
Christian

Dear All,
I have tried out the first example by using RcppArmadillo, but I
am not
sure whether the code is efficient or not. And I did the
comparison of
the
computation time.

1) R code using for loop in R: 87.22s
2) R code using apply: 77.86s
3) RcppArmadillo by using for loop in C++: 53.102s
4) RcppArmadillo together with apply in R: 47.310s

It is kind of not so big increase. I am wondering whether I used an
inefficient way for the C++ coding:

--
A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal ? Panama!

--
A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal ? Panama!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121211/1480a2e8/attachment.html>
Dear All,
I have modified the code as following. Since I found the old code had the
memory allocation issue, I calculated each term of the matrix with dim 4.
The speed is pretty much good now. But since I am not familiar with Rcpp,
the code sometimes will get error, like

 *** caught segfault ***
address 0x16, cause 'memory not mapped'
Error in betahat(ker, inv[l], cbind(inv[1:ll], inv[(ll + 1):(2 * ll)]),  :
  badly formed function expression
Calls: system.time ... apply -> FUN -> betahat -> .External -> cpp_exception
Execution halted

I am not sure whether my following code has some big issue mentioned in the
above error hint. I repeatedly call the following function in my entire
code.

// [[Rcpp::depends(RcppArmadillo)]]

#include <RcppArmadillo.h>

using namespace Rcpp;

// [[Rcpp::export]]
List betahat(Function ker, double t0, NumericMatrix Xr, NumericMatrix yr,
NumericVector tr, double h, int m) {
  int n = Xr.nrow(), p = Xr.ncol();
  arma::mat X(Xr.begin(), n, p, false);
  arma::mat y(yr.begin(), n, 1, false);
  arma::colvec t(tr.begin(), tr.size(), false);
  arma::mat T = X;
  T.each_col() %= (t-t0)/h;
  arma::vec K =as<arma::vec>(ker(tr-t0,h))/m;
  double L1 = arma::accu(K%X.col(0)%X.col(0));
  double L2 = arma::accu(K%X.col(0)%X.col(1));
  double L3 = arma::accu(K%X.col(1)%X.col(1));
  double L4 = arma::accu(K%X.col(0)%T.col(0));
  double L5 = arma::accu(K%X.col(1)%T.col(0));
  double L6 = arma::accu(K%X.col(1)%T.col(1));
  double L7 = arma::accu(K%T.col(0)%T.col(0));
  double L8 = arma::accu(K%T.col(0)%T.col(1));
  double L9 = arma::accu(K%T.col(1)%T.col(1));
  double R1 = arma::accu(K%X.col(0)%y);
  double R2 = arma::accu(K%X.col(1)%y);
  double R3 = arma::accu(K%T.col(0)%y);
  double R4 = arma::accu(K%T.col(1)%y);
  arma::mat L(2*p,2*p);
  L(0,0)=L1;L(0,1)=L2;L(0,2)=L4;L(0,3)=L5;
  L(1,0)=L2;L(1,1)=L3;L(1,2)=L5;L(1,3)=L6;
  L(2,0)=L4;L(2,1)=L5;L(2,2)=L7;L(2,3)=L8;
  L(3,0)=L5;L(3,1)=L6;L(3,2)=L8;L(3,3)=L9;
  arma::mat R(2*p,1);
  R(0,0)=R1;R(1,0)=R2;R(2,0)=R3;R(3,0)=R4;
  arma::vec betahat = arma::solve(L,R);
  arma::colvec betahat0(betahat.begin(),betahat.size()/2,false);
  return List::create(Named("betahat") = betahat0);
}

Best wishes!

Honglang Wang

Office C402 Wells Hall
Department of Statistics and Probability
Michigan State University
1579 I Spartan Village, East Lansing, MI 48823
wangho16 at msu.edu

this part will always make your code crawl along:

  arma::vec betahat = arma::inv(Inv)*arma::trans(D)*W*y;
I'd be very interested to see the before/after version of your code
code, Honglang. With timings. It'll make a good blog post or academic
paper (depending on just how many learnings you get from the process) :-)

Darren

--
Darren Cook, Software Researcher/Developer

http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
_______________________________________________
Rcpp-devel mailing list
Rcpp-devel at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121211/5fc6eda4/attachment-0001.html>
And actually, i got three different types of errors:

1)Error in betahat(ker, inv[l], cbind(inv[1:ll], inv[(ll + 1):(2 * ll)]),  :
  unused argument(s) (n = 1)
Calls: system.time ... apply -> FUN -> betahat -> .External -> cpp_exception
Execution halted

2)Error in betahat(ker, inv[l], cbind(inv[1:ll], inv[(ll + 1):(2 * ll)]),  :
  promise already under evaluation: recursive default argument reference or
ear\
lier problems?
Calls: system.time ... apply -> FUN -> betahat -> .External -> cpp_exception
Execution halted

3)*** caught segfault ***
address 0x1, cause 'memory not mapped'
Error in betahat(ker, inv[l], cbind(inv[1:ll], inv[(ll + 1):(2 * ll)]),  :
  badly formed function expression
Calls: system.time ... apply -> FUN -> betahat -> .External -> cpp_exception
Execution halted

And I really have no idea what is the problem.

Best wishes!

Honglang Wang

Office C402 Wells Hall
Department of Statistics and Probability
Michigan State University
1579 I Spartan Village, East Lansing, MI 48823
wangho16 at msu.edu

On Tue, Dec 11, 2012 at 10:16 AM, Honglang Wang

Dear All,
I have modified the code as following. Since I found the old code had the
memory allocation issue, I calculated each term of the matrix with dim 4.
The speed is pretty much good now. But since I am not familiar with Rcpp,
the code sometimes will get error, like

 *** caught segfault ***
address 0x16, cause 'memory not mapped'
Error in betahat(ker, inv[l], cbind(inv[1:ll], inv[(ll + 1):(2 * ll)]),  :
  badly formed function expression
Calls: system.time ... apply -> FUN -> betahat -> .External ->
cpp_exception
Execution halted

I am not sure whether my following code has some big issue mentioned in
the above error hint. I repeatedly call the following function in my entire
code.

// [[Rcpp::depends(RcppArmadillo)]]

#include <RcppArmadillo.h>

using namespace Rcpp;

// [[Rcpp::export]]
List betahat(Function ker, double t0, NumericMatrix Xr, NumericMatrix yr,
NumericVector tr, double h, int m) {
  int n = Xr.nrow(), p = Xr.ncol();
  arma::mat X(Xr.begin(), n, p, false);
  arma::mat y(yr.begin(), n, 1, false);
  arma::colvec t(tr.begin(), tr.size(), false);
  arma::mat T = X;
  T.each_col() %= (t-t0)/h;
  arma::vec K =as<arma::vec>(ker(tr-t0,h))/m;
  double L1 = arma::accu(K%X.col(0)%X.col(0));
  double L2 = arma::accu(K%X.col(0)%X.col(1));
  double L3 = arma::accu(K%X.col(1)%X.col(1));
  double L4 = arma::accu(K%X.col(0)%T.col(0));
  double L5 = arma::accu(K%X.col(1)%T.col(0));
  double L6 = arma::accu(K%X.col(1)%T.col(1));
  double L7 = arma::accu(K%T.col(0)%T.col(0));
  double L8 = arma::accu(K%T.col(0)%T.col(1));
  double L9 = arma::accu(K%T.col(1)%T.col(1));
  double R1 = arma::accu(K%X.col(0)%y);
  double R2 = arma::accu(K%X.col(1)%y);
  double R3 = arma::accu(K%T.col(0)%y);
  double R4 = arma::accu(K%T.col(1)%y);
  arma::mat L(2*p,2*p);
  L(0,0)=L1;L(0,1)=L2;L(0,2)=L4;L(0,3)=L5;
  L(1,0)=L2;L(1,1)=L3;L(1,2)=L5;L(1,3)=L6;
  L(2,0)=L4;L(2,1)=L5;L(2,2)=L7;L(2,3)=L8;
  L(3,0)=L5;L(3,1)=L6;L(3,2)=L8;L(3,3)=L9;
  arma::mat R(2*p,1);
  R(0,0)=R1;R(1,0)=R2;R(2,0)=R3;R(3,0)=R4;
  arma::vec betahat = arma::solve(L,R);

  arma::colvec betahat0(betahat.begin(),betahat.size()/2,false);
  return List::create(Named("betahat") = betahat0);
}

Best wishes!

Honglang Wang

Office C402 Wells Hall
Department of Statistics and Probability
Michigan State University
1579 I Spartan Village, East Lansing, MI 48823
wangho16 at msu.edu

On Thu, Dec 6, 2012 at 2:10 AM, Darren Cook <darren at dcook.org> wrote:

this part will always make your code crawl along:

  arma::vec betahat = arma::inv(Inv)*arma::trans(D)*W*y;
I'd be very interested to see the before/after version of your code
code, Honglang. With timings. It'll make a good blog post or academic
paper (depending on just how many learnings you get from the process) :-)

Darren

--
Darren Cook, Software Researcher/Developer

http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
_______________________________________________
Rcpp-devel mailing list
Rcpp-devel at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121212/7a047615/attachment.html>
The comment of Doug Bates on this thread motivated me to better examine
opportunities for algorithm improvements in my R package c++ code written
under Rcpp and RcppArmadillo.   I performed a few changes:  firstly, i
removed all general matrix inverse computations, replacing many with
triangular solving from cholesky decompositions (e.g. to conduct posterior
sampling from a multivariate gaussian where the precision matrix is readily
available).   secondly, instead of repeatedly performing high-dimensional
matrix computations composing sampled parameter and design objects during a
sequential sampling scan, I made the computation once per iteration and
then operated on the resulting large object by repeatedly extracting
portions, pre-sampling, and inserting back the post-sampled result during a
sequential scan.   thirdly,  while linear algebra compositions of
parameters and data objects must be rendered repeatedly across posterior
sampling iterations, compositions of design/data objects (with each other)
may be performed only once and re-used.   I had been sloppy and was
repeatedly performing computations purely involving data objects inside the
sampling loops.   I replaced this approach by slicing and dicing the design
objects and performing the matrix operations, once, and then repeatedly
inserting the objects during sampling.   While the first two steps -
particularly the second - produced some runtime speed improvements, I was
surprised that the third step produced only a very small improvement where
I had expected a large reduction in runtime.   I used RcppArmadillo and
placed these sliced and diced matrix objects into fields from which I then
extracted elements during sampling.   So I replaced matrix computations
with extractions from field containers.   Is it possible that extraction
speeds from fields and multiplying a chain of moderate-dimensioned matrices
(e.g. 5 x 600)  are of equivalent computational intensity?  Was the
compiler compensating for my hackery?

On Tue, Dec 11, 2012 at 7:07 AM, Honglang Wang

I tried out that if the dim of the matrix less than or equal to 4, inv()
and solve() have the same speed.

Best wishes!

Honglang Wang

Office C402 Wells Hall
Department of Statistics and Probability
Michigan State University
1579 I Spartan Village, East Lansing, MI 48823
wangho16 at msu.edu

On Mon, Dec 10, 2012 at 8:05 AM, c s <conradsand.arma at gmail.com> wrote:

Hi Honglang,

I recommend looking into using the solve() function in Armadillo, instead
of inv(). Using solve() will be considerably faster.

More details here:
http://arma.sourceforge.net/docs.html#solve

On Thursday, December 6, 2012, Honglang Wang <wanghonglang2008 at gmail.com>
wrote:
The following is a full example although I don't know whether it's
minimal or not:
library(Rcpp)
library(RcppArmadillo)
sourceCpp("betahat_mod.cpp")

#The following is data generation.
  n=200
  m=20
  p=2
  t=runif(m*n,min=0, max=1)
  X1=rnorm(m*n,0,1)
  X1=as.matrix(1+2*exp(t)+X1)
  X2=rnorm(m*n,0,1)
  X2=as.matrix(3-4*t^2+X2)
  X=cbind(X1,X2)
  beta1=0.5*sin(t)
  beta2=beta1
  rho=0.2
  sig2=1

eps=unlist(lapply(1:n,function(x){as.matrix(arima.sim(list(ar=rho),m,sd=sqrt(sig2*(1-rho^2))))}))
  y=as.matrix(X1*beta1+X2*beta2+eps)

#A simple kernel function.
ker=function(x,h)
  {
    ans=x
    lo=(abs(x)<h)
    ans[lo]=(3/4)*(1-(x[lo]/h)^2)/h
    ans[!lo]=0
    return(ans)
  }

h=0.3

#assess the time for evaluating bethat of 100 t's:
system.time((betahatt=t(apply(as.matrix(t[1:100]),1,function(x)
betahat(ker,x,X,y,t,h,m)$betahat))))
And the .cpp file is the following:

// [[Rcpp::depends(RcppArmadillo)]]

#include <RcppArmadillo.h>

using namespace Rcpp;

// [[Rcpp::export]]
List betahat(Function ker, double t0, NumericMatrix Xr, NumericMatrix
yr, NumericVector tr, double h, int m) {
  int n = Xr.nrow(), p = Xr.ncol();
  arma::mat X(Xr.begin(), n, p, false);
  arma::mat y(yr.begin(), n, 1, false);
  arma::colvec t(tr.begin(), tr.size(), false);
  arma::mat T = X;
  T.each_col() %= (t-t0)/h;
  arma::mat D = arma::join_rows(X,T);
  arma::vec kk =as<arma::vec>(ker(tr-t0,h));
  arma::mat W = (arma::diagmat(kk))/m;
  arma::mat Inv = arma::trans(D)*W*D;
  arma::vec betahat = arma::inv(Inv)*arma::trans(D)*W*y;
  arma::colvec betahat0(betahat.begin(),betahat.size()/2,false);
  return List::create(Named("betahat") = betahat0);
}

Best wishes!

Honglang Wang

Office C402 Wells Hall
Department of Statistics and Probability
Michigan State University
1579 I Spartan Village, East Lansing, MI 48823
wangho16 at msu.edu

On Wed, Dec 5, 2012 at 4:07 AM, Christian Gunning <xian at unm.edu> wrote:
Can you post a minimal full example?
-Christian

On Tue, Dec 4, 2012 at 8:39 PM, Honglang Wang
<wanghonglang2008 at gmail.com> wrote:
Yes, the main issue for my coding is the allocation of memory.  And
I have
fixed one of the biggest memory allocation issue: 4000 by 4000
diagonal
matrix. And since I am not familiar with Rcpp and RcppArmadillo, I
have no
idea how to reuse the memory. I hope I can have some materials to
learn
this. Thanks.

What exactly do these timings show?  A single call you your
function?
How many calls?

Here I called my function for 100 times.

Building on Romain's point: -- a portion of your function's runtime
is
in memory allocation
(and you have a lot of allocations here).
If you're calling your function thousands or millions of times, then
it might pay to closely
examine your memory allocation strategies and figure out what's
temporary, for example.
It looks like you're already using  copy_aux_mem = false in a number
of places, but you're
allocating a lot of objects -- of approx what size?

For example, wouldn't this work just as well with one less
allocation?
arma::vec kk = t;
arma::uvec q1 = arma::find(arma::abs(tp)<h);
kk.elem(q1) = ((1-arma::pow(tp.elem(q1)/h,2))/h)*0.75;
// done with q1.  let's reuse it.
q1 = arma::find(arma::abs(tp)>=h);
// was q2
kk.elem(q1).zeros();

You could potentially allocate memory for temporary working space in
R, grab it with copy_aux_mem = false, write your temp results there,
and reuse these objects in subsequent function calls.  It doesn't
make
sense to go to this trouble, though, if your core algorithm consumes
the bulk of runtime.

Have you looked on the armadillo notes r.e. inv?  Matrix inversion
has
O(>n^2).  You may be aided by pencil-and-paper math here.
http://arma.sourceforge.net/docs.html#inv

Here my matrix for inverse is only 4 by 4, so I think it's ok.

best,
Christian

Dear All,
I have tried out the first example by using RcppArmadillo, but I
am not
sure whether the code is efficient or not. And I did the
comparison of
the
computation time.

1) R code using for loop in R: 87.22s
2) R code using apply: 77.86s
3) RcppArmadillo by using for loop in C++: 53.102s
4) RcppArmadillo together with apply in R: 47.310s

It is kind of not so big increase. I am wondering whether I used
an
inefficient way for the C++ coding:

--
A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal ? Panama!

--
A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal ? Panama!

_______________________________________________
Rcpp-devel mailing list
Rcpp-devel at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Thank you, Terrance Savitsky
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121212/f29e1fa6/attachment-0001.html>