[Rcpp-devel] Numeric comparison of two long vectors [was: quick question]

Thu, Apr 29, 2010 1:40 PM

Hi Sunduz,

Thanks for the question.  As I mentioned when we spoke, this will make
a good demonstration of the Rcpp interface to compiled code in C++
because it is the type of calculation that is going to be very slow in
native R code but can be done quite quickly in C++.

I am cc:'ing the Rcpp-devel list on this so that others can comment
and perhaps compare methods.

The gist of the question is that you have long vector, tobs, of
observed values and a reference distribution represented by tperm.
For each element, tobs[i], you want to know the proportion of the
values in tperm less than tobs[i].

In the current form the problem is that you are doing N * B
comparisons plus a lot of sums and all that.  The first thing to do to
speed it up is to sort tperm so you only need to compare to the first
j satisfying tperm[j] > tobs[i].  If you also sort tobs then you can
start at the point where the last comparison failed.  For tobs you
should use order, rather than sort which will allow you to associate
the pvalues with the original ordering of tobs.

The output I enclose is before assigning the pvalues back to the
correct observations (and may have a few "infelicities", as Bill
Venables calls them, still lurking around).

On Thu, Apr 29, 2010 at 1:40 PM, Sunduz Keles <keles at stat.wisc.edu> wrote:

-------------- next part --------------

R version 2.10.1 (2009-12-14)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

+   Rcpp::NumericVector tord(tordP), tperm(tpermP);
+   Rcpp::NumericVector ans(tord.size());  // result vector
+   double *to = tord.begin(), *tp = tperm.begin(), *aa = ans.begin();
+   int nobs = tord.size(), nperm = tperm.size();
+   for (int i = 0, j = 0; i < nobs; i++) {
+     while (tp[j] < to[i] && j < nperm) j++;
+     aa[i] = ((double)j)/((double)nperm);
+   }
+   return ans;
+ '

Loading required package: Rcpp

num [1:3000000] 1.67e-07 1.67e-07 8.33e-07 1.00e-06 1.67e-06 ...

user  system elapsed 
 13.140   1.310  14.069