Skip to content
Prev 286382 / 398502 Next

Speeding up "accumulation" code in large matrix calc?

On Fri, Feb 24, 2012 at 09:06:28PM +0100, Berend Hasselman wrote:
[...]
[...]
Hi.

Let me include one more solution operating on rows to the test.

  t4 <- function(A, outcome) {
      oneRow <- function(x, outcome)
      {
          n <- length(x)
          y <- 1:n
          z <- y
          z[c(FALSE, x[-n] != outcome)] <- 0
          y - cummax(z)
      }
      t(apply(A, 1, oneRow, outcome=outcome))
  }

  library(compiler)
  t1.c <- cmpfun(t1)
  t3.c <- cmpfun(t3)
  t4.c <- cmpfun(t4)
 
  Nrow <- 100
  Ncol <- 1000
  A <- matrix((runif(Ncol*Nrow)<0.2)+0, nrow=Nrow)
 
  library(rbenchmark)
  benchmark(t1(A,outcome=1),
            t3(A,outcome=1),
            t4(A,outcome=1),
            t1.c(A,outcome=1),
            t3.c(A,outcome=1),
            t4.c(A,outcome=1),
            columns=c("test", "user.self", "sys.self", "relative"),
            replications=1)

                    test user.self sys.self relative
  1   t1(A, outcome = 1)     0.744        0  46.5000
  4 t1.c(A, outcome = 1)     0.284        0  17.6875
  2   t3(A, outcome = 1)     0.076        0   4.7500
  5 t3.c(A, outcome = 1)     0.072        0   4.6250
  3   t4(A, outcome = 1)     0.016        0   1.0000
  6 t4.c(A, outcome = 1)     0.020        0   1.1250

Here, t4(), t4.c() is faster than t3(), t3.c().

With

  Nrow <- 20000
  Ncol <- 50
  A <- matrix((runif(Ncol*Nrow)<0.2)+0, nrow=Nrow)

i get

                    test user.self sys.self  relative
  1   t1(A, outcome = 1)     7.444        0 20.233696
  4 t1.c(A, outcome = 1)     2.740        0  7.442935
  2   t3(A, outcome = 1)     0.368        0  1.000000
  5 t3.c(A, outcome = 1)     0.368        0  1.002717
  3   t4(A, outcome = 1)     0.592        0  1.605978
  6 t4.c(A, outcome = 1)     0.536        0  1.456522

Here, t3() and t3.c() are faster than t4(), t4.c().

Petr.