Skip to content
Prev 106299 / 398506 Next

Aggregate with numerous factors

Joachim Claudet wrote:
aggregate() is (currently) a wrapper for tapply(), so generates a table
which is indexed by the cartesian product of all the factors. If many cells
are empty, you can reduce the work by calculating the interaction factor up
front and remove levels that are not present in the data. This is pretty
much
the idea you already had, unless you forgot the bit about removing unused
levels. You could potentially extend the idea to all 12 factors, and then
extract the ones you want "on their own" from the result.

Alternatively, rewrite aggregate() and send us a patch ;-)

It is not necessarily all that hard. Here's a rough idea

IX <- as.data.frame(by)
OO <- do.call(order,IX)
Y <- x[OO,]
g <- cumsum(!duplicated(IX))
FF <- unique(IX)
cbind(FF, sapply(split(x,g),FUN))

(completely untested, of course, and if it works, it works only for a
single-column x; otherwise, you need a loop over the columns somehow.)