Skip to content

[Rcpp-devel] Rcpp Parallel and Rcpp Armadillo

5 messages · Qiang Kou, Dirk Eddelbuettel, Maxime To +1 more

#
Hi,

I started to work with Rcpp and I have some trouble to adapt some code. I copy below the following code:

- I want to write a function named contrib1, that is simplified version here, but the idea is that that function makes a large use of the pnorm (and qnorm) on marices and I would like to make it faster by applying the function using Rcpp parallel.

- Then I started with an example that I tried to adapt (http://gallery.rcpp.org/articles/parallel-matrix-transform/) to build the structure Pnorm and the function pPnorm. I also created the function f (I wasn't able to use directly the pnorm function as would Sugar allow...)

The example doesn't work. And I don't really understand why. I guess that it is a problem of memory allocation... Can anyone help me on that point?? Moreover, I was able to build a previous version that was working, but once I started to iterate on the contrib1 function (I execute an optimization algorithm), the RAM memory saturated I wasn't able to figure out why the program was not freeing memory after each execution of the function.

I attach the code below. I am a Rcpp novice, so don't hesitate to comment on strange things....
Thanks for your answers!

Maxime

--------------------------------------

#include <RcppArmadillo.h>
#include <cmath>
#include <algorithm>
#include <RcppParallel.h>

using namespace Rcpp;
using namespace arma;
using namespace std;
using namespace RcppParallel;

// [[Rcpp::depends(RcppArmadillo, RcppParallel)]]

inline double f(double x) { return ::Rf_pnorm5(x, 0.0, 1.0, 1, 0); }

struct Pnorm : public Worker
{
   // source matrix
   mat input;
   
   // destination matrix
   mat output;
   
   // initialize with source and destination
   Pnorm(mat input, mat output) 
      : input(input), output(output) {}
   
   // take the square root of the range of elements requested
   void operator()(std::size_t begin, std::size_t end) {
      std::transform(input.begin() + begin, 
                     input.begin() + end, 
                     output.begin() + begin, 
                     f);
   }
};

mat pPnorm(mat x) {
  
    // allocate the output matrix
    mat output(x.n_rows, x.n_cols);
  
    // Pnorm functor (pass input and output matrixes)
    Pnorm ppnorm(x, output);
  
    // call parallelFor to do the work
    parallelFor(0, x.n_elem, ppnorm);
  
    // return the output matrix
    mat outmat(output.begin(), output.n_rows, output.n_cols);
      return outmat;
}

// [[Rcpp::export]]
mat contrib1(mat my_matrix) {

        mat pbound2    = pPnorm(my_matrix);
    return pbound2;
}


------------------------------------------
### Main R code

rm(list=ls(all=TRUE))

library(Rcpp)
library(RcppGSL)
library(RcppParallel)
library(RcppArmadillo)
setwd('U:/testParallel')

sourceCpp("test.cpp")

asd = matrix(runif(1000), ncol = 100)
contrib1(asd)


 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20141209/8fdd907f/attachment.html>
#
What do you mean by "doesn't work" ? Compiling error or the result is not
right?

I just tried the code, and it seems the code can compile and work.

Best,

KK
On Tue, Dec 9, 2014 at 6:33 AM, Maxime To <maxime.to at outlook.fr> wrote:

            

  
    
#
On 9 December 2014 at 09:46, Qiang Kou wrote:
| What do you mean by "doesn't work" ? Compiling error or the result is not
| right?
| 
| I just tried the code, and it seems the code can compile and work.

I am generally very careful about calling back to anything related to R from
functions to be parallelized.  So for

     inline double f(double x) { return ::Rf_pnorm5(x, 0.0, 1.0, 1, 0); }

I think going with an equivalent pnorm() function from Boost / Bh may be better.

But I am shooting from my hip here as I have not had time to look at this,
having been out way too late at a nice concert :) 

Dirk
#
Hi, 

I changed the function as indicated by Dirk and I modify the functions and the program does work now.
However, I am still puzzled by the memory use of the program. when I run a loop of my function in R as in the code below, it seems that the program does not free the memory used in the previous iterations... which is annoying when I need to optimize on my final object.

So I was wondering whether it was a question of declaration of object in my code?

------------------------------------------------------------------------------------------------------------------

sourceCpp("Rcpp/test.cpp") #
qwe = matrix(runif(10000), nrow = 100)
a = contrib1(qwe)
b = qnorm(qwe)
a - b

for (i in 1:20000) a = contrib1(qwe)
----------------------------------------------------------
// test.cpp

#include <RcppArmadillo.h>
#include <cmath>
#include <algorithm>
#include <RcppParallel.h>
#include <boost/math/distributions/inverse_gaussian.hpp>
 
using namespace Rcpp;
using namespace arma;
using namespace std;
using namespace RcppParallel;
 
// [[Rcpp::depends(RcppArmadillo, RcppParallel, BH)]]
 
double qnorm_f(const double& x_q) {
    boost::math::normal s;
    return boost::math::quantile(s, x_q);
};

 
 
struct Qnorm : public Worker
{
   // source matrix
   const RMatrix<double> input_q;
    
   // destination matrix
   RMatrix<double> output_q;
    
   // initialize with source and destination
   Qnorm(const NumericMatrix input_q, NumericMatrix output_q)
      : input_q(input_q), output_q(output_q) {}
    
   // take the Pnorm of the range of elements requested
   void operator()(std::size_t begin, std::size_t end) {
      std::transform(input_q.begin() + begin,
                     input_q.begin() + end,
                     output_q.begin() + begin,
                     ::qnorm_f);
   }
};
 
mat pQnorm(mat xx_q) {
 
    NumericMatrix x_q = Rcpp::as<Rcpp::NumericMatrix>(wrap(xx_q));
   
    // allocate the output matrix
    const NumericMatrix output_q(x_q.nrow(), x_q.ncol());
   
    // Pnorm functor (pass input and output matrices)
    Qnorm qnorm_temp(x_q, output_q);
   
    // call parallelFor to do the work
    parallelFor(0, x_q.length(), qnorm_temp);
   
    // return the output matrix
    mat outmat_q(output_q.begin(), output_q.nrow(),output_q.ncol());
    return outmat_q;
 
}
 
// [[Rcpp::export]]
mat contrib1(mat X1) {
 
    mat test    = pQnorm(X1);
    mat results = test;

    return results;
}

----------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20141210/2273b5ed/attachment.html>
#
Some pointers. 

When you use an arma::mat passed by value in an Rcpp::export, this means copying all of the data of the underlying R object into armadillo. I?d suggest you use a reference to const to avoid that, i.e. 

mat contrib1(const mat& X1) { ? }

Then in pQnorm, you do: 

NumericMatrix x_q = Rcpp::as<Rcpp::NumericMatrix>(wrap(xx_q));

That is yet again, copying all of the data from the arma::mat into an Rcpp matrix. 

You then return a arma::mat, which data is copied implicitly as the return of contrib1. 

I?d suggest you do all this without armadillo, which you don?t really use except for inducing a lot of extra copies of data. 

To anwser your last question, R uses a garbage collector, so the memory is not automatically reclaimed as soon as it is no longer needed. 

Hope this helps. 

Romain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20141210/fbd549b5/attachment-0001.html>