SUM,COUNT,AVG
On Mon, Apr 6, 2009 at 9:34 AM, Stavros Macrakis <macrakis at alum.mit.edu> wrote:
There are various ways to do this in R. # sample data dd <- data.frame(a=1:10,b=sample(3,10,replace=T),c=sample(3,10,replace=T)) Using the standard built-in functions, you can use: *** aggregate *** aggregate(dd,list(b=dd$b,c=dd$c),sum) ?b c ?a b c 1 1 1 10 2 2 2 2 1 ?3 2 1 .... *** tapply *** tapply(dd$a,interaction(dd$b,dd$c),sum) ? ? ?1.1 ? ? ? 2.1 ? ? ? 3.1 ? ? ? 1.2 ? ? ? 2.2 ? ? ? 3.2 ? ? ? 1.3 2.3 ?5.000000 ?3.000000 10.000000 ?5.000000 ? ? ? ?NA ? ? ? ?NA ?5.000000 ... But the nicest way is probably to use the plyr package:
library(plyr) ddply(dd,~b+c,sum)
?b c V1 1 1 1 14 2 2 1 ?6 .... ******** Unfortunately, none of these approaches allows you do return more than one result from the function, so you'll need to write
ddply(dd,~b+c,length) ? # count ddply(dd,~b+c,sum) ddply(dd,~b+c,mean) ? # arithmetic average
There is an 'each' function in plyr, but it doesn't seem to be compatible with ddply.
That's because ddply applies the function to the whole data frame, not just the columns that aren't participating in the split. One way around it is: ddply(dd, ~ b + c, function(df) each(length, sum, mean)(df$a)) I haven't figured out a more elegant way to specify this yet. Hadley