Message-ID: <x264nlh3g5.fsf@turmalin.kubism.ku.dk>
Date: 2006-02-11T23:37:46Z
From: Peter Dalgaard
Subject: aggregate vs tapply; is there a middle ground?
In-Reply-To: <f8e6ff050602111444n42affdaer16f94a1fb9ede76b@mail.gmail.com>
hadley wickham <h.wickham at gmail.com> writes:
> > I faced a similar problem. Here's what I did
> >
> > tmp <-
> > data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
> > tmp1 <- with(tmp,aggregate(C,list(A=A,B=B),sum))
> > tmp2 <- expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B)))
> > merge(tmp2,tmp1,all.x=T)
> >
> > At least fewer than 10 extra lines of code. Anyone with a simpler solution?
>
> Well, you can almost do this in with the reshape package:
>
> tmp <-
> data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
> a <- recast(tmp, A + B ~ ., sum)
> # see also recast(tmp, A ~ B, sum)
> add.all.combinations(a, row="A", cols = "B")
>
> Where add.all.combinations basically does what you outlined above --
> it would be easy enough to generalise to multiple dimensions.
Anything wrong with
> as.data.frame(with(tmp,as.table(tapply(C,list(A=A,B=B),sum))))
A B Freq
1 A a NA
2 B a -0.2524320
3 C a 3.8539264
4 D a NA
5 A c 0.7227294
6 B c -0.2694669
7 C c 0.4760957
8 D c NA
9 A e NA
10 B e 0.1800500
11 C e NA
12 D e -1.0350928
(except the silly colname, responseName="sum" should fix that).
--
O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907