Skip to content
Prev 2154 / 10988 Next

[Rcpp-devel] [R] Find number of elements less than some number: Elegant/fastsolution needed

I enclose some comparisons, admittedly on only one size of example (10
million in the reference distribution and 100,000 in the test sample).

The versions using Rcpp and the std algorithms came out about 3 times
as fast as the version using R's findInterval.  What surprises me is
that the sequential comparisons in C++ (version k) is slightly faster
than the binary search versions (h and j).  The unassisted binary
search requires 30 probes for each element of ans and the sequential
comparisons show take an average of 100 comparisons.  I suppose the
difference is sequential access versus random access to the elements
of y.
On Fri, Apr 15, 2011 at 7:34 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
-------------- next part --------------

R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
Loading required package: inline
Loading required package: Rcpp
Loading required package: rbenchmark
+     ord <- order(x)
+     ans <- integer(length(x))
+     ans[ord] <- length(y) - findInterval(-x[ord], rev(-sort(y)))
+     ans
+ }
+ {
+     Rcpp::NumericVector x(x_),
+         y = clone(Rcpp::NumericVector(y_));
+     int n = x.size();
+     Rcpp::IntegerVector ans(n);
+     const Rcpp::NumericVector::iterator bb = y.begin(), ee = y.end();
+                // sort (a copy of) the y values for binary search
+     std::sort(bb, ee);
+     
+     for (int i = 0; i < n; i++) {
+ 	ans[i] = std::upper_bound(bb, ee, x[i]) - bb;
+     }
+     return ans;
+ }
+ ', plugin="Rcpp")
+ {
+     Rcpp::NumericVector x(x_),
+         y = clone(Rcpp::NumericVector(y_));
+     int n = x.size();
+     Rcpp::IntegerVector ans(n);
+     const Rcpp::NumericVector::iterator bb = y.begin(), ee = y.end();
+     Rcpp::NumericVector::iterator mm = y.begin();
+                // sort (a copy of) the y values for binary search
+     std::sort(bb, ee);
+     
+     for (int i = 0; i < n; i++) {
+         mm = std::upper_bound(mm, ee, x[i]);
+         ans[i] = mm - bb;
+     }
+     return ans;
+ }
+ ', plugin="Rcpp")
+     ord <- order(x)
+     ans <- integer(length(x))
+     ans[ord] <- j1(x[ord], y)
+     ans
+ }
+ {
+     Rcpp::NumericVector xs(xs_),
+         y = clone(Rcpp::NumericVector(y_));
+     int n = xs.size();
+     Rcpp::IntegerVector ord(ord_), ans(n);
+     const Rcpp::NumericVector::iterator bb = y.begin(), ee = y.end();
+     Rcpp::NumericVector::iterator yy = y.begin();
+     double *xp = xs.begin();
+                // sort (a copy of) the y values for binary search
+     std::sort(bb, ee);
+     
+     for (int i = 0; i < n && yy < ee; i++, xp++) {
+ 	while (*xp >= *yy && yy < ee) yy++;
+ 	ans[ord[i] - 1] = yy - bb;
+     }
+     return ans;
+ }
+ ', plugin="Rcpp")
+     ord <- order(x)
+     k1(x[ord], y, ord)
+ }
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
+           order="relative", replications=10,
+           columns=c("test", "replications", "elapsed", "relative"))
     test replications elapsed relative
5 k(x, y)           10  19.477 1.000000
4 j(x, y)           10  19.984 1.026031
3 h(x, y)           10  20.107 1.032346
2 g(x, y)           10  57.691 2.962006
1 f(x, y)           10  60.563 3.109462
user  system elapsed 
213.060  15.880 233.067 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bb.R
Type: application/octet-stream
Size: 2865 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20110417/cc479b0f/attachment.obj>