Hi. Just one last clarifying question on this issue before I dive back in. Suppose I declared a new Rcpp::List object in my C++ code, and copied the list elements from either the SEXP or the original Rcpp::List. Since the new memory is allocated in C++, would I still have the same problem because of the way Rcpp allocated the memory? Or would the copy be thread-safe? Similarly, what if I were to create an STL container of Rcpp::Lists, and operate on each element of the container in parallel? Same problem?
[Rcpp-devel] Stack imbalance warning when using Rcpp and OpenMP
7 messages · Michael Braun, Davor Cubranic, Dirk Eddelbuettel
6 days later
Michael,
On 19 August 2011 at 14:09, Michael Braun wrote:
| Hi. Just one last clarifying question on this issue before I dive back in. | | Suppose I declared a new Rcpp::List object in my C++ code, and copied the list elements from either the SEXP or the original Rcpp::List. Since the new memory is allocated in C++, would I still have the same problem because of the way Rcpp allocated the memory? Or would the copy be thread-safe? | | Similarly, what if I were to create an STL container of Rcpp::Lists, and operate on each element of the container in parallel? Same problem? | | From your helpful responses, it seems like the best alternative is to explicitly copy the contents of each SEXP in the list to a totally non-Rcpp object. I'm just wondering if keeping some of the data in the original classes might still work. | | And finally, since I am still relatively new to C++, are there any standard classes that might make more sense than others? I'm considering an STL vector of either Eigen or Armadillo matrices, for example. Good idea, or bad? OpenMP is tricky. I would definitely recommend reading up on tutorials _just on the C++ side_ and working with some self-contained C++ examples. Only once you feel you have a reasonable handle, move on to Rcpp and OpenMP. There are new issues, as the `Stack imbalance' issue you have seen which can arise when you return prematurely while other threads still chew on R data structures. Long story short I just committed a new self-contained example to the Rcpp source which you can look at (and copy) via the URL https://r-forge.r-project.org/scm/viewvc.php/pkg/Rcpp/inst/examples/OpenMP/OpenMPandInline.r?view=markup&root=rcpp I simply takes a vector of size two million, set up as the sequence from 1 .. N and then computes the log of each element. In other words it is pretty light on the actual computation. The results bear this out. On my (standard i7, four cores hyperthreaded) box: edd at max:~$ r ~/svn/rcpp/pkg/Rcpp/inst/examples/OpenMP/OpenMPandInline.r Loading required package: methods test replications elapsed relative user.self sys.self 2 funOpenMP(z) 100 3.219 1.000000 25.26 0.07 3 funSerialRcpp(z) 100 9.030 2.805219 9.43 0.32 4 funSugarRcpp(z) 100 9.423 2.927307 9.06 0.35 1 funSerial(z) 100 9.601 2.982603 9.59 0.00 edd at max:~$ So OpenMP 'wins' but the gain is sublinear at a factor of three -- we need to compare to method 'funSerial' which also uses a C++ vector. This indicates some communications overhead between the threads. Rcpp sugar has no real leg up on manual loops, but is the shortest implementation in two lines. Looping over an Rcpp vector is a little faster than looping over a C++ STL vector (which incurs a copy). Hope this helps, Dirk
Two new Rcpp master classes for R and C++ integration scheduled for New York (Sep 24) and San Francisco (Oct 8), more details are at http://dirk.eddelbuettel.com/blog/2011/08/04#rcpp_classes_2011-09_and_2011-10 http://www.revolutionanalytics.com/products/training/public/rcpp-master-class.php
On August 25, 2011 05:32:45 PM Dirk Eddelbuettel wrote:
Long story short I just committed a new self-contained example to the Rcpp source which you can look at (and copy) via the URL
[...]
The results bear this out. On my (standard i7, four cores hyperthreaded)
box:
edd at max:~$ r ~/svn/rcpp/pkg/Rcpp/inst/examples/OpenMP/OpenMPandInline.r
Loading required package: methods
test replications elapsed relative user.self sys.self
2 funOpenMP(z) 100 3.219 1.000000 25.26 0.07
3 funSerialRcpp(z) 100 9.030 2.805219 9.43 0.32
4 funSugarRcpp(z) 100 9.423 2.927307 9.06 0.35
1 funSerial(z) 100 9.601 2.982603 9.59 0.00
edd at max:~$
[...]
Rcpp sugar has no real leg up on manual loops, but is the shortest implementation in two lines. Looping over an Rcpp vector is a little faster than looping over a C++ STL vector (which incurs a copy).
I know you wanted to keep all code looking the same, but with std::transform
all serial code is very short:
serialStdAlgCode <- '
std::vector<double> x = Rcpp::as<std::vector< double > >(xs);
std::transform(x.begin(), x.end(), x.begin(), ::log);
return Rcpp::wrap(x);
'
funSerialStdAlg <- cxxfunction(signature(xs="numeric"), body=serialStdAlgCode,
plugin="Rcpp")
serialStdAlgRcppCode <- '
Rcpp::NumericVector x = Rcpp::NumericVector(xs);
std::transform(x.begin(), x.end(), x.begin(), ::log);
return x;
'
funSerialStdAlgRcpp <- cxxfunction(signature(xs="numeric"),
body=serialStdAlgRcppCode, plugin="Rcpp")
The results (without OpenMP because I'm currently on a single-core CPU):
test replications elapsed relative user.self sys.self
3 funSerialStdAlgRcpp(z) 20 4.236 1.000000 3.792 0.252
4 funSerialRcpp(z) 20 4.312 1.017941 3.792 0.272
5 funSugarRcpp(z) 20 4.537 1.071058 3.744 0.588
2 funSerialStdAlg(z) 20 5.514 1.301700 4.329 0.884
1 funSerial(z) 20 5.536 1.306893 4.480 0.808
Davor
On 26 August 2011 at 08:37, Davor Cubranic wrote:
| On August 25, 2011 05:32:45 PM Dirk Eddelbuettel wrote:
| > Long story short I just committed a new self-contained example to the Rcpp | > source which you can look at (and copy) via the URL | [...] | > The results bear this out. On my (standard i7, four cores hyperthreaded) | > box: | > | > edd at max:~$ r ~/svn/rcpp/pkg/Rcpp/inst/examples/OpenMP/OpenMPandInline.r | > Loading required package: methods | > test replications elapsed relative user.self sys.self | > 2 funOpenMP(z) 100 3.219 1.000000 25.26 0.07 | > 3 funSerialRcpp(z) 100 9.030 2.805219 9.43 0.32 | > 4 funSugarRcpp(z) 100 9.423 2.927307 9.06 0.35 | > 1 funSerial(z) 100 9.601 2.982603 9.59 0.00 | > edd at max:~$ | > | [...] | > Rcpp sugar has no real leg up on manual loops, but is the shortest | > implementation in two lines. Looping over an Rcpp vector is a little faster | > than looping over a C++ STL vector (which incurs a copy). | | I know you wanted to keep all code looking the same, but with std::transform | all serial code is very short: Good thinking, I'll make that change! | serialStdAlgCode <- ' | std::vector<double> x = Rcpp::as<std::vector< double > >(xs); | std::transform(x.begin(), x.end(), x.begin(), ::log); | return Rcpp::wrap(x); | ' | funSerialStdAlg <- cxxfunction(signature(xs="numeric"), body=serialStdAlgCode, | plugin="Rcpp") | | serialStdAlgRcppCode <- ' | Rcpp::NumericVector x = Rcpp::NumericVector(xs); | std::transform(x.begin(), x.end(), x.begin(), ::log); | return x; | ' | funSerialStdAlgRcpp <- cxxfunction(signature(xs="numeric"), | body=serialStdAlgRcppCode, plugin="Rcpp") That's much nicer. And quicker. Nice :) | The results (without OpenMP because I'm currently on a single-core CPU): | | test replications elapsed relative user.self sys.self | 3 funSerialStdAlgRcpp(z) 20 4.236 1.000000 3.792 0.252 | 4 funSerialRcpp(z) 20 4.312 1.017941 3.792 0.272 | 5 funSugarRcpp(z) 20 4.537 1.071058 3.744 0.588 | 2 funSerialStdAlg(z) 20 5.514 1.301700 4.329 0.884 | 1 funSerial(z) 20 5.536 1.306893 4.480 0.808 Thanks for the suggestion. Dirk
Two new Rcpp master classes for R and C++ integration scheduled for New York (Sep 24) and San Francisco (Oct 8), more details are at http://dirk.eddelbuettel.com/blog/2011/08/04#rcpp_classes_2011-09_and_2011-10 http://www.revolutionanalytics.com/products/training/public/rcpp-master-class.php
That's what I get with Davor's additions:
edd at max:~/svn/rcpp/pkg/Rcpp/inst/examples/OpenMP$ r OpenMPandInline.r
Loading required package: methods
test replications elapsed relative user.self sys.self
2 funOpenMP(z) 100 3.996 1.000000 30.53 0.91
5 funSerialRcpp(z) 100 8.960 2.242242 8.57 0.39
3 funSerialStdAlgRcpp(z) 100 8.975 2.245996 9.39 0.34
6 funSugarRcpp(z) 100 9.225 2.308559 9.00 0.22
4 funSerialStdAlg(z) 100 10.003 2.503253 9.22 0.77
1 funSerial(z) 100 10.092 2.525526 9.42 0.67
edd at max:~/svn/rcpp/pkg/Rcpp/inst/examples/OpenMP$
OpenMP still wins in 'elapsed' and using the STL algorithm is good for
marginal gains over manual loops.
Dirk
Two new Rcpp master classes for R and C++ integration scheduled for New York (Sep 24) and San Francisco (Oct 8), more details are at http://dirk.eddelbuettel.com/blog/2011/08/04#rcpp_classes_2011-09_and_2011-10 http://www.revolutionanalytics.com/products/training/public/rcpp-master-class.php
On August 26, 2011 09:05:44 AM Dirk Eddelbuettel wrote:
That's what I get with Davor's additions: 5 funSerialRcpp(z) 100 8.960 2.242242 8.57 0.39 3 funSerialStdAlgRcpp(z) 100 8.975 2.245996 9.39 0.34
Interesting -- using std::transform was consistently faster for me in both Rcpp and std::vector variants. I was using gcc 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4), on KUbuntu 11.04. Davor
On 26 August 2011 at 16:11, Davor Cubranic wrote:
| On August 26, 2011 09:05:44 AM Dirk Eddelbuettel wrote:
| > That's what I get with Davor's additions: | > | > 5 funSerialRcpp(z) 100 8.960 2.242242 8.57 0.39 | > 3 funSerialStdAlgRcpp(z) 100 8.975 2.245996 9.39 0.34 | | Interesting -- using std::transform was consistently faster for me in both | Rcpp and std::vector variants. Both those two times are basically instinguishable. And this is timed on my moderately busy server so there is some noise. | I was using gcc 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4), on KUbuntu 11.04. Also Ubuntu 11.04 here. Dirk
Two new Rcpp master classes for R and C++ integration scheduled for New York (Sep 24) and San Francisco (Oct 8), more details are at http://dirk.eddelbuettel.com/blog/2011/08/04#rcpp_classes_2011-09_and_2011-10 http://www.revolutionanalytics.com/products/training/public/rcpp-master-class.php