Skip to content
Prev 276324 / 398506 Next

Is it possible to vectorize/accelerate this?

You should get familiar with some basic timing tools
and techniques so you can investigate things like this
yourself.

system.time is the most basic timing tool.  E.g.,
  > system.time(for(i in 1:1000)f0(a))
     user  system elapsed
   22.920   0.000  22.932
means it took c. 23 seconds of real time to run f0(a)
1000 times.

When comparing timing, it makes things easier to define
a series of functions that implement the various algorithms
but have the same inputs and outputs.  E.g., for your problem

f0 <- function(a_vec) {
    b_vec <- a_vec
    for (i in 2:length(b_vec)){
        b_vec[i] <- ifelse(abs(b_vec[i-1] + a_vec[i]) > 1, a_vec[i], b_vec[i-1] + a_vec[i])
    }
    b_vec
}

f1 <- function(a_vec) {
    b_vec <- a_vec
    for (i in 2:length(b_vec)){
        b_vec[i] <- if(abs(b_vec[i-1] + a_vec[i]) > 1) a_vec[i] else b_vec[i-1] + a_vec[i]
    }
    b_vec
}

f2 <- function(a_vec) {
    b_vec <- a_vec
    for (i in 2:length(b_vec)){
        if(abs(s <- b_vec[i-1] + a_vec[i]) <= 1) b_vec[i] <- s
    }
    b_vec
}

Then run them with the same dataset:
user  system elapsed
 22.920   0.000  22.932
user  system elapsed
  5.510   0.000   5.514
user  system elapsed
  4.210   0.000   4.217

(The rbenchmark package's benchmark function encapsulates this idiom.) 

It pays to use a dataset similar to the one you will ultimately be using,
where "similar" depends on the context.  E.g., the algorithm in f2 is relatively
faster when the cumsum exceeds 1 most of the time
user  system elapsed
 21.900   0.000  21.912
user  system elapsed
  4.610   0.000   4.609
user  system elapsed
  2.490   0.000   2.494

If you will be working with large datasets, you should look at how the
time grows as the size of the dataset grows.  If the time looks quadratic between,
say, length 100 and length 200, don't waste your time testing it for length 1000000.
For algorithms that work on data.frames (or matrices), the relative speed ofen
depends on the ratio of the number of rows and the number of columns of data.
Check that out.  For these sorts of tests it is worthwhile to make a function to
generate "typical" looking data of any desired size.
 
It doesn't take too long to do this once you have the right mindset.  Once you
do you don't have to rely on folklore like "never use loops" and instead do
evidence-based computing.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com