Skip to content

[Rcpp-devel] R.e. Speed gain assistance (Wray, Christopher)

7 messages · Dirk Eddelbuettel, Wray, Christopher, Hadley Wickham +1 more

#
There's a number of things going on in your example.  Are you sure
that sample() is the bottleneck?  You might want to try breaking this
into smaller pieces and benchmarking each with rbenchmark.

On Wed, Aug 17, 2011 at 3:00 AM,
<rcpp-devel-request at r-forge.wu-wien.ac.at> wrote:
Pulling R functions into a C++ function isn't hard, but there's
overhead -- if i recall correctly, it's appreciably slower than the
API.

Take a look at this (unweighted) sample() function.  It's giving R a
run for it's money, and is pretty fast even for very large n, and it
looks statistically correct (not sure if I'm glossing over ugly,
machine-specific details of double->int conversion here). Does this
shed any light on your question?

best,
Christian

library(inline); library(Rcpp);
src1<-'
NumericVector xx(x);
int xx_sz = xx.size()-1; // index of last xx
int nn=as<int>(n);
NumericVector ret(nn);
RNGScope scope;
for (int i=0; i<nn; i++) {
    ret[i] = xx[Rf_runif(0.0,xx_sz)];
};
return(ret);
'

mysample<-cxxfunction( signature(x='numeric', n="numeric"), src1, plugin='Rcpp')

system.time(result <- mysample(1:50, 1e7))
system.time(resultR <- sample(1:50, 1e7, replace=T))
#
Hi - thanks. Sure, my description was not very good. I suppose its a nested sample - of a sample - with an operation (+) inbetween....(and a function call...).

Suggestion was useful though, thanks. So..I can shave off 5 seconds (from a total of 20-ish) by chucking away
floor, amending runif and just making sure I am not introducing anything 'biased' via the convert.

I know that R 'converts' floating values for index positions...but (for some reason) I did not think to check that in works like that for NumericVector..too.
...I'll take the 5 seconds regardless! :-) ta, chris.
#
On 18 August 2011 at 01:14, Christian Gunning wrote:
| There's a number of things going on in your example.  Are you sure
| that sample() is the bottleneck?  You might want to try breaking this
| into smaller pieces and benchmarking each with rbenchmark.

I had the same initial thought. There was simply too much going on.
 
| On Wed, Aug 17, 2011 at 3:00 AM,
| <rcpp-devel-request at r-forge.wu-wien.ac.at> wrote:
| >
| > I have thought (but not tried) importing R's sample function...but I suspect this may be a totally rude idea.
| 
| Pulling R functions into a C++ function isn't hard, but there's
| overhead -- if i recall correctly, it's appreciably slower than the
| API.

Yup. And again: you can measure this.

I see direct calls of R functions as a convenient stop-gap measure for
something you may have to do once in a while. Not a million times in a big
bootstrapping exercise.
 
Dirk

| Take a look at this (unweighted) sample() function.  It's giving R a
| run for it's money, and is pretty fast even for very large n, and it
| looks statistically correct (not sure if I'm glossing over ugly,
| machine-specific details of double->int conversion here). Does this
| shed any light on your question?
| 
| best,
| Christian
| 
| library(inline); library(Rcpp);
| src1<-'
| NumericVector xx(x);
| int xx_sz = xx.size()-1; // index of last xx
| int nn=as<int>(n);
| NumericVector ret(nn);
| RNGScope scope;
| for (int i=0; i<nn; i++) {
|     ret[i] = xx[Rf_runif(0.0,xx_sz)];
| };
| return(ret);
| '
| 
| mysample<-cxxfunction( signature(x='numeric', n="numeric"), src1, plugin='Rcpp')
| 
| system.time(result <- mysample(1:50, 1e7))
| system.time(resultR <- sample(1:50, 1e7, replace=T))
| 
| 
| -- 
| A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal ? Panama!
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
#
Thanks for responses, sure I do make use of the R side - when required and where there is not a significant penalty.

I suppose the root of my question (aside from the functions calls complicating and slowing things down) was that if in R, I might do something like Vector1[subvector_of_relevant_indices] = Vector2, say:

R> (1:50)[runif(10,0,20)]

And I was wondering if I had missed some sugar-type of construct that would allow me to avoid the explicit 1e7 loop - and instead make use of some Rcpp vectorised/optimised/ machinery. Christian's sugg' of allowing double->int did speed things up enough, by me to chuck away the "floor"...

Aside from that, the genral theme I was aiming for was "sampling via indices" which I assumed someone may had already have a wheel which they had invented..already.. :-)

No probs - and sorry for the unclear post! Thanks
#
Don't forget about sample.int:
user  system elapsed
  0.493   0.064   0.557
user  system elapsed
  0.872   0.089   0.962
user  system elapsed
  0.209   0.001   0.212

Hadley
#
No worries. There are some important points here, some that have been
raised repeatedly.

On Thu, Aug 18, 2011 at 7:57 AM,
<rcpp-devel-request at r-forge.wu-wien.ac.at> wrote:
A big point here is that loops aren't "bad" in C++ -- the speed
penalties are much less if you're using pure C++ within them, not
making spurious function calls in them, etc.  Sugar is exactly what
the name implies -- syntactic sugar that beautifies code. It's not
secret turbo-sauce -- that's what the compiler is for.

-xian
#
On Thu, Aug 18, 2011 at 7:57 AM, Hadley Wickham <hadley at rice.edu> wrote:
So, I can't figure out how to import and use sample.int into C++.  I
keep getting the following when I exchange sample.int for sample,
below.  I'm guess it's the interaction between IntegerVector and R??
Not sure it matters, but I'm surprised/confused:

cpp_exception("invalid first argument", "Rcpp::eval_error")

src2<-'
NumericVector xx(x);
int xx_sz = xx.size(); // index of last xx
// generate 0:(length(xx)-1)
IntegerVector index(xx_sz, 1.0); // cant use 1 here, 1.0 still required
std::partial_sum(index.begin(), index.end(), index.begin());
index = index -1;

// sample out of index
int nn=as<int>(n);
IntegerVector sindex(nn);
NumericVector ret(nn);
//Function sample("sample.int"); // doesnt work
Function sample("sample");
RNGScope scope;
// sindex has length nn
sindex = sample(index, _["size"]=nn, _["replace"] = true);
for (int i=0; i<nn; i++) {
    ret[i] = xx[ sindex[i] ];
};
return(ret);
'

mysample2<-cxxfunction( signature(n="numeric", x='numeric'), src2,
plugin='Rcpp')
print(system.time(result <- mysample2( 1e7, 50:1)))

I can shave a little time off by changing everything to NumericVectors
and using ret in place of sindex (and thus avoiding an extra Vector
allocation of size nn), but then I don't expect sample.int to work
anyway.

Moral: In this case, a pure C++ loop is cleaner *and* faster.  Not to
beat a dead horse or anything...
-Christian