Skip to content
Back to formatted view

Raw Message

Message-ID: <CADNH-Pt6prrONuXngBmwHjS_2owyg9W=UHaO1DPTQkxBJSuX+Q@mail.gmail.com>
Date: 2011-08-19T00:48:23Z
From: Christian Gunning
Subject: [Rcpp-devel] R.e. Speed gain assistance (Wray, Christopher)
In-Reply-To: <CABdHhvGCqjhrte6YzTFhu8AvVjK_AO2xQO2giHO=yTuSTiaoPQ@mail.gmail.com>

On Thu, Aug 18, 2011 at 7:57 AM, Hadley Wickham <hadley at rice.edu> wrote:
>> Take a look at this (unweighted) sample() function. ?It's giving R a
>> run for it's money, and is pretty fast even for very large n, and it
>> looks statistically correct (not sure if I'm glossing over ugly,
>> machine-specific details of double->int conversion here). Does this
>> shed any light on your question?
>> mysample<-cxxfunction( signature(x='numeric', n="numeric"), src1, plugin='Rcpp')
>>
>> system.time(result <- mysample(1:50, 1e7))
>> system.time(resultR <- sample(1:50, 1e7, replace=T))
>
> Don't forget about sample.int:
>
>> system.time(result <- mysample(1:50, 1e7))
> ? user ?system elapsed
> ?0.493 ? 0.064 ? 0.557
>> system.time(resultR <- sample(1:50, 1e7, replace=T))
> ? user ?system elapsed
> ?0.872 ? 0.089 ? 0.962
>> system.time(resultR <- sample.int(1:50, 1e7, replace=T))
> ? user ?system elapsed
> ?0.209 ? 0.001 ? 0.212
>
> Hadley

So, I can't figure out how to import and use sample.int into C++.  I
keep getting the following when I exchange sample.int for sample,
below.  I'm guess it's the interaction between IntegerVector and R??
Not sure it matters, but I'm surprised/confused:

cpp_exception("invalid first argument", "Rcpp::eval_error")

src2<-'
NumericVector xx(x);
int xx_sz = xx.size(); // index of last xx
// generate 0:(length(xx)-1)
IntegerVector index(xx_sz, 1.0); // cant use 1 here, 1.0 still required
std::partial_sum(index.begin(), index.end(), index.begin());
index = index -1;

// sample out of index
int nn=as<int>(n);
IntegerVector sindex(nn);
NumericVector ret(nn);
//Function sample("sample.int"); // doesnt work
Function sample("sample");
RNGScope scope;
// sindex has length nn
sindex = sample(index, _["size"]=nn, _["replace"] = true);
for (int i=0; i<nn; i++) {
    ret[i] = xx[ sindex[i] ];
};
return(ret);
'

mysample2<-cxxfunction( signature(n="numeric", x='numeric'), src2,
plugin='Rcpp')
print(system.time(result <- mysample2( 1e7, 50:1)))

I can shave a little time off by changing everything to NumericVectors
and using ret in place of sindex (and thus avoiding an extra Vector
allocation of size nn), but then I don't expect sample.int to work
anyway.

Moral: In this case, a pure C++ loop is cleaner *and* faster.  Not to
beat a dead horse or anything...
-Christian


-- 
A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal ? Panama!