Skip to content

aggregate vs tapply; is there a middle ground?

4 messages · Hadley Wickham, Peter Dalgaard, Hans Gardfjell

#
I faced a similar problem. Here's what I did

tmp <- 
data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
tmp1 <- with(tmp,aggregate(C,list(A=A,B=B),sum))
tmp2 <- expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B)))
merge(tmp2,tmp1,all.x=T)

At least fewer than 10 extra lines of code. Anyone with a simpler solution?

Cheers, Hans
lebouton wrote:

  
    
#
Well, you can almost do this in with the reshape package:

tmp <-
data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
a <- recast(tmp, A + B ~ ., sum)
# see also recast(tmp, A  ~ B, sum)
add.all.combinations(a, row="A", cols = "B")

Where add.all.combinations basically does what you outlined above --
it would be easy enough to generalise to multiple dimensions.

Hadley
#
hadley wickham <h.wickham at gmail.com> writes:
Anything wrong with
A B       Freq
1  A a         NA
2  B a -0.2524320
3  C a  3.8539264
4  D a         NA
5  A c  0.7227294
6  B c -0.2694669
7  C c  0.4760957
8  D c         NA
9  A e         NA
10 B e  0.1800500
11 C e         NA
12 D e -1.0350928

(except the silly colname, responseName="sum" should fix that).
1 day later
#
Thanks Peter!

I had a "feeling" that there must be a simpler, better, more elegant 
solution.

/Hans
Peter Dalgaard wrote: