<rcpp-devel-request at r-forge.wu-wien.ac.at> wrote:
I was able to write a very short C++ function using the Rcpp package
that provided about a 1000-fold increase in speed relative to the best
I could do in R. ?I don't have the script on this computer so I will
post it tomorrow when I am back on the computer at the office.
Apologies for cross-posting to the Rcpp-devel list but I am doing so
because this might make a good example of the usefulness of Rcpp and
inline.
And RcppArmadillo, as the case may be.
This is a cool little problem. In the examples given, I'd caution
people against comparing apples and durian. The sort(x) is a cost
that should be considered *within* each implementation. I used
Armadillo to sort (src, f4), and get another 100% worth of speedup
that I can't reproducing using R's sort (src1, f1-f3). If i modify
SEXP in-place (and this always confuses me, so I tend to avoid it),
I'm seeing an additional ~5-10% speed gain (src2, f5) -- the advantage
of this last seems to be primarily in memory-constrained applications.
On to the code!
src = '
NumericVector xx_(clone(x)), yy_(clone(y));
int nxx = xx_.size();
int nyy = yy_.size();
arma::vec xx(xx_), yy(yy_);
yy = sort(yy);
xx = sort(xx);
//
//
int j = 0; //gt index for yy
for (int i=0; i < nxx; i++) {
while ((j < nyy) && ( xx(i) > yy(j) ) ) {
j++;
}
xx_(i) = j;
}
return (xx_);
'
src1 = '
NumericVector xx_(clone(x)), yy_(clone(y));
// assumes x & y are already sorted
arma::vec xx(xx_), yy(yy_);
int nxx = xx.n_elem;
int nyy = yy.n_elem;
int j = 0; //gt index for yy
for (int i=0; i < nxx; i++) {
while ((j < nyy) && ( xx(i) > yy(j) ) ) {
j++;
}
xx_(i) = j;
}
return (xx_);
'
src2 = '
NumericVector xx_(x), yy_(y); //kinda scary
int nxx = xx_.size();
int nyy = yy_.size();
arma::vec xx(xx_.begin(), nxx, false), yy(yy_.begin(), nyy, false);
//really kinda scary
yy = sort(yy);
xx = sort(xx);
//
//
int j = 0; //gt index for yy
for (int i=0; i < nxx; i++) {
while ((j < nyy) && ( xx(i) > yy(j) ) ) {
j++;
}
xx_(i) = j;
}
return (xx_);
'
require(inline)
require(RcppArmadillo)
f1 <- function(x, y) { sort(length(y) - findInterval(-x, rev(-sort(y))));}
f2 <- function(x, y) {x = sort(x); length(y) - findInterval(-x, rev(-sort(y)))}
f3.a <- cxxfunction(signature(x="numeric", y="numeric"), src1,
plugin='RcppArmadillo')
f3 <- function(x,y) {
x <- sort(x)
y <- sort(y)
return(f3.a(x,y))
}
f4 <- cxxfunction(signature(x="numeric", y="numeric"), src,
plugin='RcppArmadillo')
## danger -- violates R semantics
f5 <- cxxfunction(signature(x="numeric", y="numeric"), src2,
plugin='RcppArmadillo')
## this is a really ugly test. ygwypf, i suppose :)
for (i in 1:5) {
x1 <- x <- rnorm(5e6)
y1 <- y <- rnorm(5e6)
print( cbind(
r1=system.time(r1 <- f1(x,y)),
r2=system.time(r2 <- f2(x,y)), r3=system.time(r3 <- f3(x1,y1)),
r4 = system.time(r4 <- f4(x,y)), r5 = system.time(r5 <- f5(x,y))
))
}
print(all.equal(r1, r2))
print(all.equal(r1, r3))
print(all.equal(r1, r4))
print(all.equal(r1, r5))
best,
Christian Gunning
University of New Mexico Biology Department
A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal ? Panama!
On Thu, Apr 14, 2011 at 11:47 PM, Christian Gunning <xian at unm.edu> wrote:
On Thu, Apr 14, 2011 at 7:02 PM,
<rcpp-devel-request at r-forge.wu-wien.ac.at> wrote:
I was able to write a very short C++ function using the Rcpp package
that provided about a 1000-fold increase in speed relative to the best
I could do in R. ?I don't have the script on this computer so I will
post it tomorrow when I am back on the computer at the office.
Apologies for cross-posting to the Rcpp-devel list but I am doing so
because this might make a good example of the usefulness of Rcpp and
inline.
And RcppArmadillo, as the case may be.
This is a cool little problem. ?In the examples given, I'd caution
people against comparing apples and durian. ?The sort(x) is a cost
that should be considered *within* each implementation. I used
Armadillo to sort (src, f4), and get another 100% worth of speedup
that I can't reproducing using R's sort (src1, f1-f3). ?If i modify
SEXP in-place (and this always confuses me, so I tend to avoid it),
I'm seeing an additional ~5-10% speed gain (src2, f5) -- the advantage
of this last seems to be primarily in memory-constrained applications.
On to the code!
src = '
NumericVector xx_(clone(x)), yy_(clone(y));
int nxx = xx_.size();
int nyy = yy_.size();
arma::vec xx(xx_), yy(yy_);
yy = sort(yy);
xx = sort(xx);
//
//
int j = 0; //gt index for yy
for (int i=0; i < nxx; i++) {
? ?while ((j < nyy) && ( xx(i) > yy(j) ) ) {
? ? ? ?j++;
? ?}
? ?xx_(i) = j;
? }
return (xx_);
'
src1 = '
NumericVector xx_(clone(x)), yy_(clone(y));
// assumes x & y are already sorted
arma::vec xx(xx_), yy(yy_);
int nxx = xx.n_elem;
int nyy = yy.n_elem;
int j = 0; //gt index for yy
for (int i=0; i < nxx; i++) {
? ?while ((j < nyy) && ( xx(i) > yy(j) ) ) {
? ? ? ?j++;
? ?}
? ?xx_(i) = j;
?}
return (xx_);
'
src2 = '
NumericVector xx_(x), yy_(y); ?//kinda scary
int nxx = xx_.size();
int nyy = yy_.size();
arma::vec xx(xx_.begin(), nxx, false), yy(yy_.begin(), nyy, false);
//really kinda scary
yy = sort(yy);
xx = sort(xx);
//
//
int j = 0; //gt index for yy
for (int i=0; i < nxx; i++) {
? ?while ((j < nyy) && ( xx(i) > yy(j) ) ) {
? ? ? ?j++;
? ?}
? ?xx_(i) = j;
}
return (xx_);
'
require(inline)
require(RcppArmadillo)
f1 <- function(x, y) { sort(length(y) - findInterval(-x, rev(-sort(y))));}
f2 <- function(x, y) {x = sort(x); length(y) - findInterval(-x, rev(-sort(y)))}
f3.a <- cxxfunction(signature(x="numeric", y="numeric"), src1,
plugin='RcppArmadillo')
f3 <- function(x,y) {
? ? ? ?x <- sort(x)
? ? ? ?y <- sort(y)
? ? ? ?return(f3.a(x,y))
}
f4 <- cxxfunction(signature(x="numeric", y="numeric"), src,
plugin='RcppArmadillo')
## ?danger -- violates R semantics
f5 <- cxxfunction(signature(x="numeric", y="numeric"), src2,
plugin='RcppArmadillo')
## this is a really ugly test. ygwypf, i suppose :)
for (i in 1:5) {
?x1 <- x <- rnorm(5e6)
?y1 <- y <- rnorm(5e6)
?print( cbind(
? ?r1=system.time(r1 <- f1(x,y)),
? ?r2=system.time(r2 <- f2(x,y)), r3=system.time(r3 <- f3(x1,y1)),
? ?r4 = system.time(r4 <- f4(x,y)), r5 = system.time(r5 <- f5(x,y))
?))
}
print(all.equal(r1, r2))
print(all.equal(r1, r3))
print(all.equal(r1, r4))
print(all.equal(r1, r5))
I agree that the cost of sorting should be taken into account but I
don't think you need to go to the RcppArmadillo package to get a sort
function. Why not use std::sort?
Also, I did sequential comparisons as shown in your code but after
reading Bill Dunlap's response and looking at the documentation for
the findInterval function in R I smacked myself on the forehead and
thought "Duh - binary search, of course".
I haven't looked at the C code underlying the findInterval function
yet so I don't know if Martin has clever tricks for sorted x and y.
However the documentation for the std::upper_bound template at
cplusplus.com shows how to use that for the case here. The best I can
think of for sorted x and y is to pass the upper bound from x[i] as
the first argument in the call to std::upper_bound for x[i+1].
Unfortunately I am staring at a series of deadlines today so
implementations and comparisons may need to wait until tomorrow.
P.S. to Christian: Check the archives for several of Dirk's posts to
the rcpp-devel list where he has used the rbenchmark package to
produce clean output from comparisons of implementations of
algorithms.
I enclose some comparisons, admittedly on only one size of example (10
million in the reference distribution and 100,000 in the test sample).
The versions using Rcpp and the std algorithms came out about 3 times
as fast as the version using R's findInterval. What surprises me is
that the sequential comparisons in C++ (version k) is slightly faster
than the binary search versions (h and j). The unassisted binary
search requires 30 probes for each element of ans and the sequential
comparisons show take an average of 100 comparisons. I suppose the
difference is sequential access versus random access to the elements
of y.
On Fri, Apr 15, 2011 at 7:34 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
On Thu, Apr 14, 2011 at 11:47 PM, Christian Gunning <xian at unm.edu> wrote:
On Thu, Apr 14, 2011 at 7:02 PM,
<rcpp-devel-request at r-forge.wu-wien.ac.at> wrote:
I was able to write a very short C++ function using the Rcpp package
that provided about a 1000-fold increase in speed relative to the best
I could do in R. ?I don't have the script on this computer so I will
post it tomorrow when I am back on the computer at the office.
Apologies for cross-posting to the Rcpp-devel list but I am doing so
because this might make a good example of the usefulness of Rcpp and
inline.
And RcppArmadillo, as the case may be.
This is a cool little problem. ?In the examples given, I'd caution
people against comparing apples and durian. ?The sort(x) is a cost
that should be considered *within* each implementation. I used
Armadillo to sort (src, f4), and get another 100% worth of speedup
that I can't reproducing using R's sort (src1, f1-f3). ?If i modify
SEXP in-place (and this always confuses me, so I tend to avoid it),
I'm seeing an additional ~5-10% speed gain (src2, f5) -- the advantage
of this last seems to be primarily in memory-constrained applications.
On to the code!
src = '
NumericVector xx_(clone(x)), yy_(clone(y));
int nxx = xx_.size();
int nyy = yy_.size();
arma::vec xx(xx_), yy(yy_);
yy = sort(yy);
xx = sort(xx);
//
//
int j = 0; //gt index for yy
for (int i=0; i < nxx; i++) {
? ?while ((j < nyy) && ( xx(i) > yy(j) ) ) {
? ? ? ?j++;
? ?}
? ?xx_(i) = j;
? }
return (xx_);
'
src1 = '
NumericVector xx_(clone(x)), yy_(clone(y));
// assumes x & y are already sorted
arma::vec xx(xx_), yy(yy_);
int nxx = xx.n_elem;
int nyy = yy.n_elem;
int j = 0; //gt index for yy
for (int i=0; i < nxx; i++) {
? ?while ((j < nyy) && ( xx(i) > yy(j) ) ) {
? ? ? ?j++;
? ?}
? ?xx_(i) = j;
?}
return (xx_);
'
src2 = '
NumericVector xx_(x), yy_(y); ?//kinda scary
int nxx = xx_.size();
int nyy = yy_.size();
arma::vec xx(xx_.begin(), nxx, false), yy(yy_.begin(), nyy, false);
//really kinda scary
yy = sort(yy);
xx = sort(xx);
//
//
int j = 0; //gt index for yy
for (int i=0; i < nxx; i++) {
? ?while ((j < nyy) && ( xx(i) > yy(j) ) ) {
? ? ? ?j++;
? ?}
? ?xx_(i) = j;
}
return (xx_);
'
require(inline)
require(RcppArmadillo)
f1 <- function(x, y) { sort(length(y) - findInterval(-x, rev(-sort(y))));}
f2 <- function(x, y) {x = sort(x); length(y) - findInterval(-x, rev(-sort(y)))}
f3.a <- cxxfunction(signature(x="numeric", y="numeric"), src1,
plugin='RcppArmadillo')
f3 <- function(x,y) {
? ? ? ?x <- sort(x)
? ? ? ?y <- sort(y)
? ? ? ?return(f3.a(x,y))
}
f4 <- cxxfunction(signature(x="numeric", y="numeric"), src,
plugin='RcppArmadillo')
## ?danger -- violates R semantics
f5 <- cxxfunction(signature(x="numeric", y="numeric"), src2,
plugin='RcppArmadillo')
## this is a really ugly test. ygwypf, i suppose :)
for (i in 1:5) {
?x1 <- x <- rnorm(5e6)
?y1 <- y <- rnorm(5e6)
?print( cbind(
? ?r1=system.time(r1 <- f1(x,y)),
? ?r2=system.time(r2 <- f2(x,y)), r3=system.time(r3 <- f3(x1,y1)),
? ?r4 = system.time(r4 <- f4(x,y)), r5 = system.time(r5 <- f5(x,y))
?))
}
print(all.equal(r1, r2))
print(all.equal(r1, r3))
print(all.equal(r1, r4))
print(all.equal(r1, r5))
I agree that the cost of sorting should be taken into account but I
don't think you need to go to the RcppArmadillo package to get a sort
function. ?Why not use std::sort?
Also, I did sequential comparisons as shown in your code but after
reading Bill Dunlap's response and looking at the documentation for
the findInterval ?function in R I smacked myself on the forehead and
thought "Duh - binary search, of course".
I haven't looked at the C code underlying the findInterval function
yet so I don't know if Martin has clever tricks for sorted x and y.
However the documentation for the std::upper_bound template at
cplusplus.com shows how to use that for the case here. ?The best I can
think of for sorted x and y is to pass the upper bound from x[i] as
the first argument in the call to std::upper_bound for x[i+1].
Unfortunately I am staring at a series of deadlines today so
implementations and comparisons may need to wait until tomorrow.
P.S. to Christian: ?Check the archives for several of Dirk's posts to
the rcpp-devel list where he has used the rbenchmark package to
produce clean output from comparisons of ?implementations of
algorithms.
-------------- next part --------------
R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
require("inline")
Loading required package: inline
require("Rcpp")
Loading required package: Rcpp
require("rbenchmark")
Loading required package: rbenchmark
## I'm changing the rules a bit here because Sunduz wanted the
## counts in the original order of x
f <- function(x, y) length(y) - findInterval(-x, rev(-sort(y)))
## findInterval can take advantage of x being ordered
g <- function(x, y) {
+ ord <- order(x)
+ ans <- integer(length(x))
+ ans[ord] <- length(y) - findInterval(-x[ord], rev(-sort(y)))
+ ans
+ }
## version using std::upper_bound for unordered x
h <- cxxfunction(signature(x_="numeric", y_="numeric"), '
+ {
+ Rcpp::NumericVector x(x_),
+ y = clone(Rcpp::NumericVector(y_));
+ int n = x.size();
+ Rcpp::IntegerVector ans(n);
+ const Rcpp::NumericVector::iterator bb = y.begin(), ee = y.end();
+ // sort (a copy of) the y values for binary search
+ std::sort(bb, ee);
+
+ for (int i = 0; i < n; i++) {
+ ans[i] = std::upper_bound(bb, ee, x[i]) - bb;
+ }
+ return ans;
+ }
+ ', plugin="Rcpp")
## version using std::upper_bound when x is non-decreasing
j1 <- cxxfunction(signature(x_="numeric", y_="numeric"), '
+ {
+ Rcpp::NumericVector x(x_),
+ y = clone(Rcpp::NumericVector(y_));
+ int n = x.size();
+ Rcpp::IntegerVector ans(n);
+ const Rcpp::NumericVector::iterator bb = y.begin(), ee = y.end();
+ Rcpp::NumericVector::iterator mm = y.begin();
+ // sort (a copy of) the y values for binary search
+ std::sort(bb, ee);
+
+ for (int i = 0; i < n; i++) {
+ mm = std::upper_bound(mm, ee, x[i]);
+ ans[i] = mm - bb;
+ }
+ return ans;
+ }
+ ', plugin="Rcpp")
j <- function(x, y) {
+ ord <- order(x)
+ ans <- integer(length(x))
+ ans[ord] <- j1(x[ord], y)
+ ans
+ }
## version using sequential search
k1 <- cxxfunction(signature(xs_="numeric", y_="numeric", ord_="integer"), '
+ {
+ Rcpp::NumericVector xs(xs_),
+ y = clone(Rcpp::NumericVector(y_));
+ int n = xs.size();
+ Rcpp::IntegerVector ord(ord_), ans(n);
+ const Rcpp::NumericVector::iterator bb = y.begin(), ee = y.end();
+ Rcpp::NumericVector::iterator yy = y.begin();
+ double *xp = xs.begin();
+ // sort (a copy of) the y values for binary search
+ std::sort(bb, ee);
+
+ for (int i = 0; i < n && yy < ee; i++, xp++) {
+ while (*xp >= *yy && yy < ee) yy++;
+ ans[ord[i] - 1] = yy - bb;
+ }
+ return ans;
+ }
+ ', plugin="Rcpp")
k <- function(x, y) {
+ ord <- order(x)
+ k1(x[ord], y, ord)
+ }
x <- rnorm(1e5)
yy <- y <- rnorm(1e7)
yy[1] <- y[1] # ensure y and yy are distinct
aa <- f(x, y)
all.equal(aa, g(x, y))
[1] TRUE
all.equal(aa, h(x, y))
[1] TRUE
all.equal(aa, j(x, y))
[1] TRUE
all.equal(aa, k(x, y))
[1] TRUE
all.equal(y, yy) # check that nothing has changed in y