Skip to content
Prev 274018 / 398506 Next

SLOW split() function

The following avoids the overhead of data.frame methods
(and assumes the data.frame doesn't include matrices
or other data.frames) and relies on split(vector,factor)
quickly splitting a vector into a list of vectors.
For a 10^6 row by 10 column data.frame split in 10^5
groups this took 14.1 seconds while split took 658.7 s.
Both returned the same thing.

Perhaps something based on this idea would help your
parallelized by().

mysplit.data.frame <-
function (x, f, drop = FALSE, ...)
{
    f <- as.factor(f)
    tmp <- lapply(x, function(xi) split(xi, f, drop = drop, ...))
    rn <- split(rownames(x), f, drop = drop, ...)
    tmp <- unlist(unname(tmp), recursive = FALSE)
    tmp <- split(tmp, factor(names(tmp), levels = unique(names(tmp))))
    tmp <- lapply(setNames(seq_along(tmp), names(tmp)), function(i) {
        t <- tmp[[i]]
        names(t) <- names(x)
        attr(t, "row.names") <- rn[[i]]
        class(t) <- "data.frame"
        t
    })
    tmp
} 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com