Skip to content

Aggregate with numerous factors

3 messages · Joachim Claudet, Peter Dalgaard

#
Dear list members,

I am facing some problems using the aggregate() function.
I want to calculate a sum and a mean of one variable over the 
combination of 12 factors with the aggregate() function to avoid loops 
but it doesn't work (or the job is far too long, it exceeds 2 hours). It 
works with a fewer number of factors, so I constructed a factor being 
the levels combination of 7 factors (I need the other ones being on 
their own). I had then 6 factors, but it still doesn't work.
Could someone tell me how to fix the problem or know another function I 
could use ?
Thank you very much,
Joachim Claudet.
#
Joachim Claudet wrote:
aggregate() is (currently) a wrapper for tapply(), so generates a table
which is indexed by the cartesian product of all the factors. If many cells
are empty, you can reduce the work by calculating the interaction factor up
front and remove levels that are not present in the data. This is pretty
much
the idea you already had, unless you forgot the bit about removing unused
levels. You could potentially extend the idea to all 12 factors, and then
extract the ones you want "on their own" from the result.

Alternatively, rewrite aggregate() and send us a patch ;-)

It is not necessarily all that hard. Here's a rough idea

IX <- as.data.frame(by)
OO <- do.call(order,IX)
Y <- x[OO,]
g <- cumsum(!duplicated(IX))
FF <- unique(IX)
cbind(FF, sapply(split(x,g),FUN))

(completely untested, of course, and if it works, it works only for a
single-column x; otherwise, you need a loop over the columns somehow.)
#
Peter Dalgaard wrote:
I see two glaring blunders already...

You need IX[OO,] in two places, and split(Y, g) not x