[Rcpp-devel] When calling same Rcpp function several times different results are returned
The examples in the RcppParallel documentation assume that access to vectors and matrixes are *aligned* (i.e. fall into neat buckets whereby reading and writing doesn't overlap between worker instances). Your example appears to access arbitrary elements of sg (depending on what's passed in gi) which probably creates overlapping reads/writes. You should also study the documentation for join carefully. There's nothing incorrect about RcppParallel's behavior here, rather you need to think more carefully about the access patterns of your data and how they might conflict. You may need to introduce locking to overcome the conflicts, which in turn could kill the performance benefit you gain from parallelism. No easy answers here :-\
On Tue, Jul 14, 2015 at 7:15 AM, Danas Zuokas <danas.zuokas at gmail.com> wrote:
Yes it is the same question on SO and I did consider RHertel's comments. But this problem (sums by group id) is not parallelFor it is parallelReduce: I split vector, calculate sums and then aggregate those sums. Please correct me if I am wrong. 2015-07-14 13:54 GMT+03:00 Dirk Eddelbuettel <edd at debian.org>:
On 14 July 2015 at 09:25, Danas Zuokas wrote: | I have written parallel implementation of sums in groups using RcppParallel. Isn't this the same question as http://stackoverflow.com/questions/31318419/when-calling-same-rcpp-function-several-times-different-results-are-returned You got some excellent comments there by SO user 'RHertel'. Did you consider those? Dirk | // [[Rcpp::depends(RcppParallel)]] | #include <Rcpp.h> | #include <RcppParallel.h> | using namespace Rcpp; | using namespace RcppParallel; | | struct SumsG: public Worker | { | const RVector<double> v; | const RVector<int> gi; | | RVector<double> sg; | | SumsG(const NumericVector v, const IntegerVector gi, NumericVector sg): v(v), gi(gi), sg(sg) {} | SumsG(const SumsG& p, Split): v(p.v), gi(p.gi), sg(p.sg) {} | | void operator()(std::size_t begin, std::size_t end) { | for (std::size_t i = begin; i < end; i++) { | sg[gi[i]] += v[i]; | } | } | | void join(const SumsG& p) { | for(std::size_t i = 0; i < sg.length(); i++) { | sg[i] += p.sg[i]; | } | } | }; | | // [[Rcpp::export]] | List sumsingroups(NumericVector v, IntegerVector gi, int ni) { | NumericVector sg(ni); | SumsG p(v, gi, sg); | parallelReduce(0, v.length(), p); | | return List::create(_["sg"] = p.sg); | } | | It compiles using Rcpp::sourceCpp. Now when I call it from R sumsingroups(1:10, | rep(0:1, each = 5), 2) several times I get the right answer (15 40) and then | something different (usually some multiplicative of the right answer). Running | | | res <- sumsingroups(1:10, rep(0:1, each = 5), 2) | for(i in 1:1000) { | tmp <- sumsingroups(1:10, rep(0:1, each = 5), 2) | if(res[[1]][1] != tmp[[1]][1]) break | Sys.sleep(0.1) | } | | breaks at random iteration returning | | $sg | [1] 60 160 | | or | | $sg | [1] 30 80 | | I am new to Rcpp and RcppParallel and do not know what could cause such | behavior. | | Things that did not help: | | 1. Added for (std::size_t i = 0; i < sg.length(); i++) sg[i] = 0; to both of | constructors. | 2. Changed names so that they are different in Worker definition and in | function implementation. | | _______________________________________________ | Rcpp-devel mailing list | Rcpp-devel at lists.r-forge.r-project.org | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel -- http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel