Skip to content
Prev 206141 / 398503 Next

Making routine faster by using apply instead of for-loop

The first problem is that random (element-by-element)
access to a data.frame is much slower than the equivalent
access to a matrix.  Rewriting your code a bit to
use a matrix speeds up the c=500 case by a factor of 750.
f0 <- function (c = 10)  {
    mat = matrix(1:(c * c), c, c)
    rownames(mat) = seq(c, 1, length = c)
    colnames(mat) = c(seq(2, c, length = c/2), seq(c, 2, length = c/2))
    v = as.numeric(rownames(mat))
    w = as.numeric(colnames(mat))
    for (i in 1:c) {
        for (j in 1:c) {
            if (v[j] + w[i] <= c) {
                mat[i, j] = NA
            }
        }
    }
    mat
}
Rewriting that to insert the NA's one operation speeds it up by
another factor of 10 (in the c=500 case)
f1 <- function (c = 10) {
    v <- seq(c, 1, length = c)
    w <- c(seq(2, c, length = c/2), seq(c, 2, length = c/2))
    mat <- matrix(1:(c * c), nrow = c, ncol = c, dimnames = list(v, 
        w))
    mat[outer(w, v, `+`) <= c] <- NA
    mat
}

If you really want a matrix, pass the output of these functions
into data.frame (with check.names=FALSE since the column
names are not considered legal on data.frame: the contain
duplicates and look numeric).

By the way, it is generally a bad idea to use apply() on
a data.frame.  It is meant for matrices.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com