effective way to return only the first argument of "which()"
On 19-09-2012, at 20:02, Bert Gunter wrote:
Well, following up on this observation, which can be put under the
heading of "Sometimes vectorization can be much slower than explicit
loops" , I offer the following:
firsti <-function(x,k)
{
i <- 1
while(x[i]<=k){i <- i+1}
i
}
system.time(for(i in 1:100)which(x>.99)[1])
user system elapsed 19.1 2.4 22.2
system.time(for(i in 1:100)which.max(x>.99))
user system elapsed 30.45 6.75 37.46
system.time(for(i in 1:100)firsti(x,.99))
user system elapsed 0.03 0.00 0.03 ## About a 500 - 1000 fold speedup !
firsti(x,.99)
[1] 122 It doesn't seem to scale too badly, either (whatever THAT means!): (Of course, the which() versions are essentially identical in timing, and so are omitted)
system.time(for(i in 1:100)firsti(x,.9999))
user system elapsed 2.70 0.00 2.72
firsti(x,.9999)
[1] 18200 Of course, at some point, the explicit looping is awful -- with k = .999999, the index was about 360000, and the timing test took 54 seconds. So I guess the point is -- as always -- that the optimal approach depends on the nature of the data. Prudence and robustness clearly demands the vectorized which() approaches if you have no information. But if you do know something about the data, then you can often write much faster tailored solutions. Which is hardly revelatory, of course.
And compiling the firsti function can also be quite lucrative!
firsti <- function(x,k)
{
i <- 1
while(x[i]<=k){i <- i+1}
i
}
library(compiler)
firsti.c <- cmpfun(firsti)
firsti(x,.99)
[1] 93
firsti.c(x,.99)
[1] 93
system.time(for(i in 1:100)firsti(x,.99))
user system elapsed 0.014 0.000 0.013
system.time(for(i in 1:100)firsti.c(x,.99))
user system elapsed 0.002 0.000 0.002
system.time(for(i in 1:100)firsti(x,.9999))
user system elapsed 2.148 0.013 2.164
system.time(for(i in 1:100)firsti.c(x,.9999))
user system elapsed 0.393 0.002 0.396 And in a new run (without the above tests) with k=.999999 the index was 1089653 and the timing for the uncompiled function was 152 seconds and the timing for the compiled function was 28.8 seconds! Berend