Hello Rcpp developers,
The following R/Rcpp code attempts to take in a data frame, a model formula
(passed as string), and uses Rcpp::Function to call R's model matrix
function to create model matrices B times in parallel. Each time a model
matrix is created, it is formed after permuting a given column in the data
frame.
To avoid multi-threaded access to R, I had used the "locking" idea from the
Boost example from RInside to use a scoped lock on a mutex, and use a
single set of Rcpp::Function reference variables initialized ones.
Unfortunately, the code compiles fine, but when I run the code, I get a :
Error: C stack usage close to the limit.
I would greatly appreciate any advice !
Thank you, SK.
#### ---- C++ CODE ---------------########
// [[Rcpp::depends(RcppParallel)]]
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <RcppParallel.h>
#include <tbb/tbb.h>
using namespace arma;
using namespace Rcpp;
using namespace RcppParallel;
typedef tbb::spin_mutex FreeListMutexType;
class testParallelCallingR : public Worker {
private:
Rcpp::String col2perm;
Rcpp::String formulaStr;
Rcpp::DataFrame& df0;
arma::vec& results;
Rcpp::Environment& stats, base;
Rcpp::Function& formula, modelMatrix, subset ;
SEXP& formulaObj;
FreeListMutexType FreeListMutex;
public:
explicit testParallelCallingR( Rcpp::String col2perm1,
Rcpp::String formulaStr1,
Rcpp::DataFrame& df0, arma::vec& res,
Rcpp::Environment& stats,
Rcpp::Environment& base,
Rcpp::Function& formula, Rcpp::Function&
modelMatrix, Rcpp::Function& subset, SEXP& formulaObj
): df0(df0), results(res), stats(stats),
base(base),
formula(formula),
modelMatrix(modelMatrix), subset(subset), formulaObj(formulaObj) {
col2perm=col2perm1;
formulaStr=formulaStr1;
}
arma::mat getModelMat(){
//lock
FreeListMutexType::scoped_lock lock(FreeListMutex);
//permute the column col2perm
std::string timestr(col2perm);
Rcpp::DataFrame dfw = Rcpp::clone(df0);
Rcpp::NumericVector timevals = df0[timestr]; std::random_shuffle(
timevals.begin(), timevals.end() );
dfw[timestr]=timevals;
//construct model mat with the dataframe with the permuted column
SEXP modelMatw=modelMatrix( formulaObj, dfw );
arma::mat Z = Rcpp::as<arma::mat>( modelMatw );
return Z;
}
void operator()( std::size_t begin, std::size_t end ) {
for( std::size_t j=begin; j < end; j++ ){
arma::mat Z=getModelMat();
results(j)=Z(0,0); //just as an example result, store the first index
say
}
}
};
// [[Rcpp::export]]
arma::vec permDf( int B, Rcpp::String col2perm, Rcpp::String formulaStr,
Rcpp::DataFrame df0 ){
Rcpp::Environment stats("package:stats");
Rcpp::Environment base("package:base");
Rcpp::Function formula = stats["formula"];
Rcpp::Function modelMatrix=stats["model.matrix"];
Rcpp::Function subset("[.data.frame");
SEXP formulaObj = formula(formulaStr);
arma::vec results( B, arma::fill::zeros );
testParallelCallingR tpc( col2perm, formulaStr, df0, results,
stats, base,
formula, modelMatrix, subset,
formulaObj
);
parallelFor( 0, B, tpc);
return results;
}
# -- Call function from R side --
# Create B permutatations of the column Sepal.Width, and form model matrice
permDf( B=10,
col2perm = "Sepal.Width",
formulaStr = "~Sepal.Width + Sepal.Length",
df0=iris
)
# -- Output --
Error: C stack usage 17587445176704 is too close to the limit
Error: C stack usage 17587449383296 is too close to the limit
Error: C stack usage 17587419937152 is too close to the limit
Error: C stack usage 17587432556928 is too close to the limit
Error: C stack usage 17587436763520 is too close to the limit
Error: C stack usage 17587440970112 is too close to the limit
Error: C stack usage 17587428350336 is too close to the limit
Error: C stack usage 17587424143744 is too close to the limit
Error: C stack usage 17587453589888 is too close to the limit
Execution halted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20250207/d6f02238/attachment.htm>
[Rcpp-devel] Using Rcpp::Function in parallel with TBB mutex lock & a reference.
5 messages · Kumar MS, Dirk Eddelbuettel
Hi Kumar,
From a quick look you borrow the 'Worker' object from RcppParallel. But where
(as far as I recall) all posted examples of RcppParallel do _not_ put any R objects inside a Worker instance, you put some there. That violates the recommendation in Writing R Extensions. So I think the outcome you observe is as expected. Best, Dirk
dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Thank you Dirk. Your observation is correct. I am left with two questions. As always, I appreciate your answers. 1. Except for this single step of calling an R function (model matrix-like objects that are being created by an external library in R), all my other computations are now implemented in C++ & thread friendly. Does this mean I would have no other option but to go serial if I need to call R an function? Do you have any alternative recommendations? I would really love to take advantage of RcppParallel/TBB here, as I have heavily exploited RcppParallel to parallelize everything else. 2. Your RInside calculations in the Boost thread example are multi-threaded, with a locking interface to RInside instance too. I wonder what makes that work well without R reporting issues, while the TBB/rcppParallel implementation taking a similar approach has trouble. Sincerely, Kumar
On Fri, Feb 7, 2025 at 1:41?PM Dirk Eddelbuettel <edd at debian.org> wrote:
Hi Kumar, From a quick look you borrow the 'Worker' object from RcppParallel. But where (as far as I recall) all posted examples of RcppParallel do _not_ put any R objects inside a Worker instance, you put some there. That violates the recommendation in Writing R Extensions. So I think the outcome you observe is as expected. Best, Dirk -- dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20250207/c324f142/attachment-0001.htm>
On 7 February 2025 at 14:01, Kumar MS wrote:
| Thank you Dirk. Your observation is correct. I am left with two questions. As | always, I appreciate your answers. | | 1. Except for this single step of calling an R function (model matrix-like | objects that are being created by an external library in R), all my other | computations are now implemented in C++ & thread friendly. Does this mean I | would have no other option but to go serial if I need to call R an function? Do | you have any alternative recommendations? I would really love to take advantage | of RcppParallel/TBB here, as I have heavily exploited RcppParallel to | parallelize everything else.? Not to ruin your day (again) as I believe we have been over this a few times between here and your StackOverflow questions: Nomatter how much you want it to be possible you still cannot call back to R from the parallel code. | 2. Your RInside calculations in the Boost thread example are multi-threaded, | with a locking interface to RInside instance too. I wonder what makes that work | well without R reporting issues, while the TBB/rcppParallel implementation | taking a similar approach has trouble.? RInside is not really a relevant example as it operates the other way around where you have a `main()` function and want to embed R. What we do with Rcpp is typically a running R process from which we call extensions. Not the same. Dirk
dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
1 day later
Thank you very much for the clear answers, Dirk! They greatly help! Cheers, Kumar
On Sat, Feb 8, 2025 at 1:17 PM Dirk Eddelbuettel <edd at debian.org> wrote:
On 7 February 2025 at 14:01, Kumar MS wrote: | Thank you Dirk. Your observation is correct. I am left with two questions. As | always, I appreciate your answers. | | 1. Except for this single step of calling an R function (model matrix-like | objects that are being created by an external library in R), all my other | computations are now implemented in C++ & thread friendly. Does this mean I | would have no other option but to go serial if I need to call R an function? Do | you have any alternative recommendations? I would really love to take advantage | of RcppParallel/TBB here, as I have heavily exploited RcppParallel to | parallelize everything else. Not to ruin your day (again) as I believe we have been over this a few times between here and your StackOverflow questions: Nomatter how much you want it to be possible you still cannot call back to R from the parallel code. | 2. Your RInside calculations in the Boost thread example are multi-threaded, | with a locking interface to RInside instance too. I wonder what makes that work | well without R reporting issues, while the TBB/rcppParallel implementation | taking a similar approach has trouble. RInside is not really a relevant example as it operates the other way around where you have a `main()` function and want to embed R. What we do with Rcpp is typically a running R process from which we call extensions. Not the same. Dirk -- dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20250210/bf56b1d2/attachment.htm>