Skip to content

[Rcpp-devel] [rcpp-devel] arma::mat constructor vs Rcpp::as and interaction with .insert_col

8 messages · Dirk Eddelbuettel, Conrad S, Dale Smith

#
Hello,

I had a problem this afternoon with an arma::mat constructor and the mat::insert_col method. Since I got the example from Dirk, I thought I would point out the problem I had and a solution. See

http://gallery.rcpp.org/articles/fast-linear-model-with-armadillo/

for background. Before I get to the example to reproduce the problem and the fix, let me say I'm posting from home as I can't post code from my work email; otherwise I'm fired.

I do believe Rcpp::as should be used to marshall from NumericMatrix to arma::mat, not the arma::mat constructor that accesses raw memory, as Dirk uses. I read the Armadillo documentation this afternoon and do believe the problem with .insert_col is due to the handling of memory in the constructor

mat(aux_mem*, n_rows, n_cols, copy_aux_mem = true, strict = true) 

see

http://arma.sourceforge.net/docs.html

Note that I've run similar examples on my Windows 7 box at work and on my OS X laptop (Mountain Lion) at home to confirm this is not os-dependent.

Thanks,
Dale Smith
dtsmith at mindspring.com

mat <- cbind(rnorm(50), rnorm(50), rnorm(50))
y <- rnorm(50)

cppFunction('
List fastLm(NumericVector yr, NumericMatrix Xr) {
  int n = Xr.nrow(), k = Xr.ncol();
   
  arma::mat X(Xr.begin(), n, k, false); 
  arma::colvec y(yr.begin(), yr.size(), false);
   
  arma::colvec coef = arma::solve(X, y);
  arma::colvec resid = y - X*coef;

  arma::colvec ones = arma::ones<arma::colvec>(X.n_rows);
  X.insert_cols(1, ones);
   
  double sig2 = arma::as_scalar(arma::trans(resid)*resid/(n-k));
  arma::colvec stderrest = arma::sqrt(
       sig2 * arma::diagvec( arma::inv(arma::trans(X)*X)) );
   
  return List::create(Named("coefficients") = coef,
                      Named("stderr")       = stderrest);
}', depends = c("RcppArmadillo"))


cppFunction('
List fastLmMod(NumericVector yr, NumericMatrix Xr) {
  int n = Xr.nrow(), k = Xr.ncol();
   
  arma::mat X = Rcpp::as<arma::mat>(Xr);
  arma::colvec y(yr.begin(), yr.size(), false);
   
  arma::colvec ones = arma::ones<arma::colvec>(X.n_rows);
  X.insert_cols(0, ones);
   
  arma::colvec coef = arma::solve(X, y);
  arma::colvec resid = y - X*coef;

  double sig2 = arma::as_scalar(arma::trans(resid)*resid/(n-k));
  arma::colvec stderrest = arma::sqrt(
       sig2 * arma::diagvec( arma::inv(arma::trans(X)*X)) );
   
  return List::create(Named("coefficients") = coef,
                      Named("stderr")       = stderrest);
}', depends = c("RcppArmadillo"))

fastLm(y, mat) # error is "error: Mat::init(): mismatch between size of auxiliary memory and requested size"
fastLmMod(y, mat) # coefficients are -0.15785722, -0.05668155, 0.15409366, -0.10079147
lm(y ~ mat) # confirm fastLmMod via lm function in R

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130411/82463f3b/attachment.html>
#
Hi Dale,
On 11 April 2013 at 21:12, Dale Smith wrote:
| I had a problem this afternoon with an arma::mat constructor and the
| mat::insert_col method.

That is not a method I've used, I think, so I will defer comments.  

Whenever I did what you do below (ie expand a N*k matrix with a standard
'iota' column for a constant in the regression) then I did so _before calling
the C++ code_ as can be seen in all versions of fastLm we shipped.

| Since I got the example from Dirk, I thought I would
| point out the problem I had and a solution. See

I am not sure I call that a "problem and a solution".  It is worth a
discussion on that page, possibly.  

It is a little like the clone() example I give in the Rcpp classes --
sometimes the side benefit of a copy is noticeable and even welcome -- as eg
here where you want to modify the data structure.

I don't do that each time as fastLm is for _pure speed_.
 
| http://gallery.rcpp.org/articles/fast-linear-model-with-armadillo/
| 
| for background. Before I get to the example to reproduce the problem and the
| fix, let me say I'm posting from home as I can't post code from my work email;
| otherwise I'm fired.
| 
| I do believe Rcpp::as should be used to marshall from NumericMatrix to

That is the default.

| arma::mat, not the arma::mat constructor that accesses raw memory, as Dirk
| uses.

a) By choice, and b) for faster performance speed, and c) after consulting on
this with Conrad "way back when". Nobody got bitten yet.  And we unit test
this each and every time RcppArmadillo is built and tested.

And note that that is _not_ the default constructor.  In fact, I had
contemplated replacing the default constructor but what we have now is fine.

| I read the Armadillo documentation this afternoon and do believe the
| problem with .insert_col is due to the handling of memory in the constructor
| 
| mat(aux_mem*, n_rows, n_cols, copy_aux_mem = true, strict = true) 
| 
| see
| 
| http://arma.sourceforge.net/docs.html

So just create a copy.
 
| Note that I've run similar examples on my Windows 7 box at work and on my OS X
| laptop (Mountain Lion) at home to confirm this is not os-dependent.

I think this just a subtle misunderstanding.  _If_ one wants to muck with the
data _then_ the "promise" made to R in the constructor

      arma::mat X(Xr.begin(), n, k, false); 

is not a good one.   

But for illustration of _why_ we do what we do, consider this simple
example. We sum up a matrix -- a cheap operation -- and can compare the cost
of instantiating the matrix:

------------------------------------------------------------------------------

#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
double sum1(Rcpp::NumericMatrix M) {
  arma::mat A(M.begin(), M.rows(), M.cols(), false);
  return sum(sum(A));
}

// [[Rcpp::export]]
double sum2(arma::mat M) {
  return sum(sum(M));
}


/*** R
set.seed(42)
M <- matrix(rnorm(100*100),100,100)
library(microbenchmark)
print(sum1(M))
print(sum2(M))
microbenchmark(sum1(M), sum2(M), times=100)
*/

------------------------------------------------------------------------------

which when running yields

------------------------------------------------------------------------------
R> sourceCpp("/tmp/dale.cpp")

R> set.seed(42)

R> M <- matrix(rnorm(100*100),100,100)

R> library(microbenchmark)

R> print(sum1(M))
[1] -113.094

R> print(sum2(M))
[1] -113.094

R> microbenchmark(sum1(M), sum2(M), times=100)
Unit: microseconds
    expr    min      lq  median      uq    max neval
 sum1(M)  8.256  8.4805  9.5750  9.9160 47.225   100
 sum2(M) 21.852 22.2845 22.8535 23.5475 60.001   100
R> 
------------------------------------------------------------------------------

There is nice pickup in speed when we forego the extra copy.  You created
yourself a need for the copy, and you're free to make one.

We should possibly amend the Rcpp Gallery page to note this.  Could you send
a suggested patch or added paragraph?

Thanks,  Dirk


| 
| Thanks,
| Dale Smith
| dtsmith at mindspring.com
| 
| mat <- cbind(rnorm(50), rnorm(50), rnorm(50))
| y <- rnorm(50)
| 
| cppFunction('
| List fastLm(NumericVector yr, NumericMatrix Xr) {
|   int n = Xr.nrow(), k = Xr.ncol();
|    
|   arma::mat X(Xr.begin(), n, k, false); 
|   arma::colvec y(yr.begin(), yr.size(), false);
|    
|   arma::colvec coef = arma::solve(X, y);
|   arma::colvec resid = y - X*coef;
| 
|   arma::colvec ones = arma::ones<arma::colvec>(X.n_rows);
|   X.insert_cols(1, ones);
|    
|   double sig2 = arma::as_scalar(arma::trans(resid)*resid/(n-k));
|   arma::colvec stderrest = arma::sqrt(
|        sig2 * arma::diagvec( arma::inv(arma::trans(X)*X)) );
|    
|   return List::create(Named("coefficients") = coef,
|                       Named("stderr")       = stderrest);
| }', depends = c("RcppArmadillo"))
| 
| 
| cppFunction('
| List fastLmMod(NumericVector yr, NumericMatrix Xr) {
|   int n = Xr.nrow(), k = Xr.ncol();
|    
|   arma::mat X = Rcpp::as<arma::mat>(Xr);
|   arma::colvec y(yr.begin(), yr.size(), false);
|    
|   arma::colvec ones = arma::ones<arma::colvec>(X.n_rows);
|   X.insert_cols(0, ones);
|    
|   arma::colvec coef = arma::solve(X, y);
|   arma::colvec resid = y - X*coef;
| 
|   double sig2 = arma::as_scalar(arma::trans(resid)*resid/(n-k));
|   arma::colvec stderrest = arma::sqrt(
|        sig2 * arma::diagvec( arma::inv(arma::trans(X)*X)) );
|    
|   return List::create(Named("coefficients") = coef,
|                       Named("stderr")       = stderrest);
| }', depends = c("RcppArmadillo"))
| 
| fastLm(y, mat) # error is "error: Mat::init(): mismatch between size of
| auxiliary memory and requested size"
| fastLmMod(y, mat) # coefficients are -0.15785722, -0.05668155, 0.15409366,
| -0.10079147
| lm(y ~ mat) # confirm fastLmMod via lm function in R
| 
| 
| ----------------------------------------------------------------------
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
#
Dale,

Here is a simple fix:  just _join_ the matrix and the col vector.
Result first (now with set.seed(42), so rngs different)

R> sourceCpp("/tmp/dale.cpp")

R> set.seed(42)  ## needed !!

R> mat <- cbind(rnorm(50), rnorm(50), rnorm(50))

R> y <- rnorm(50)

R> fastLm2(y, mat) 
$coefficients
           [,1]
[1,] -0.0227235
[2,]  0.0858994
[3,] -0.0457201
[4,] -0.0441346

$stderr
         [,1]
[1,] 0.115021
[2,] 0.141599
[3,] 0.136838


R> lm(y ~ mat) 

Call:
lm(formula = y ~ mat)

Coefficients:
(Intercept)         mat1         mat2         mat3  
    -0.0227       0.0859      -0.0457      -0.0441  


R> 


The key part is to use join_rows() and to actually use the new matrix:

  arma::colvec ones = arma::ones<arma::colvec>(X.n_rows);
  arma::mat X2 = arma::join_rows(ones, X);

  arma::colvec coef = arma::solve(X2, y);
  arma::colvec resid = y - X2*coef;


Full function below. 

Hth,  Dirk



// [[Rcpp::export]]
Rcpp::List fastLm2(Rcpp::NumericVector yr, Rcpp::NumericMatrix Xr) {
  int n = Xr.nrow(), k = Xr.ncol();

  arma::mat X(Xr.begin(), n, k, false);
  arma::colvec y(yr.begin(), yr.size(), false);

  arma::colvec ones = arma::ones<arma::colvec>(X.n_rows);
  arma::mat X2 = arma::join_rows(ones, X);

  arma::colvec coef = arma::solve(X2, y);
  arma::colvec resid = y - X2*coef;

  double sig2 = arma::as_scalar(arma::trans(resid)*resid/(n-k));
  arma::colvec stderrest = arma::sqrt(
       sig2 * arma::diagvec( arma::inv(arma::trans(X)*X)) );

  return Rcpp::List::create(Rcpp::Named("coefficients") = coef,
			    Rcpp::Named("stderr")       = stderrest);
}
#
On Fri, Apr 12, 2013 at 11:12 AM, Dale Smith <dtsmith at mindspring.com> wrote:
There is a further option within the Mat constructor, which when used
should prevent this error.

Specifically, we have:
mat(aux_mem*, n_rows, n_cols, copy_aux_mem = true, strict = true)

If you set strict to false, eg.
mat X(pointer, 4, 5, false, false);

the following comes into play:

"The strict variable comes into effect only if copy_aux_mem is set to
false (ie. the matrix is directly using auxiliary memory). If strict
is set to true, the matrix will be bound to the auxiliary memory for
its lifetime; the number of elements in the matrix can't be changed
(directly or indirectly). If strict is set to false, the matrix will
not be bound to the auxiliary memory for its lifetime, ie., the size
of the matrix can be changed. If the requested number of elements is
different to the size of the auxiliary memory, new memory will be
allocated and the auxiliary memory will no longer be used."

the above text is from:
http://arma.sourceforge.net/docs.html#adv_constructors_mat
#
Adding the intercept prior to calling fastLm is what I'm doing. The enclosed was a simple example. I'm implementing the RESET test to detect model specification errors so I need to add the square and cube of the fitted model.

I am also learning Armadillo as I go.

Your comments on speed are good ones. Let me look over this tonight and send a paragraph.

Sent from my iPhone
On Apr 11, 2013, at 9:52 PM, Dirk Eddelbuettel <edd at debian.org> wrote:

            
#
I did indeed read about the constructor but misinterpreted the statement - my bad. Your suggestion is a good one to try so I will benchmark my RESET test code with the various suggestions, including join as Dirk suggested.

Thanks,
Dale

Sent from my iPhone
On Apr 11, 2013, at 11:28 PM, Conrad S <conradsand.arma at gmail.com> wrote:

            
#
On 12 April 2013 at 07:34, Dale Smith wrote:
| I did indeed read about the constructor but misinterpreted the statement - my bad. Your suggestion is a good one to try so I will benchmark my RESET test code with the various suggestions, including join as Dirk suggested.

Conrad's suggestion was (of course) even better because once you declare the
memory chunk you access as modifiable, you should be able to extend as you
had attempted, without forcing a copy (as I did with either approach).

Dirk
2 days later
#
As Dirk suggested below, I'm working on a paragraph to point out the constructor used implies no modification of the arma::mat memory.

Dale
On Apr 11, 2013, at 9:52 PM, Dirk Eddelbuettel <edd at debian.org> wrote: