summarize dataframe based on multiple cols, not their combinations
Thanks, John. Your solution gives me:
ddply(my_df, .(a), summarize, mm = mean(dat), number = length(dat))
a mm number 1 0 14 3 2 1 11 3 I'm looking for (and Ista found a way):
a b c mean n 1 1 * * 11 3 2 * 1 * 14 3 3 * * 1 12 3
thanks, allie
On 3/20/2013 3:24 PM, John Kane wrote:
Will this do? library(plyr) ddply(my_df, .(a), summarize, mm = mean(dat), number = length(dat)) John Kane Kingston ON Canada
-----Original Message----- From: ashenkin at ufl.edu Sent: Wed, 20 Mar 2013 14:57:36 -0500 To: r-help at r-project.org Subject: [R] summarize dataframe based on multiple cols, not their combinations Hi folks, I'm trying to figure out how to get summarized data based on multiple columns. However, instead of giving summaries for every combination of categorical columns, I want it for each value of each categorical column regardless of the other columns. I could do this with three different commands, but i'm wondering if there's a more elegant way that I'm missing. Thanks! allie
my_df = data.frame(a = c(1,1,1,0,0,0), b=c(0,0,0,1,1,1),
c=c(1,0,1,0,1,0), dat=c(10,11,12,13,14,15))
my_df
a b c dat 1 1 0 1 10 2 1 0 0 11 3 1 0 1 12 4 0 1 0 13 5 0 1 1 14 6 0 1 0 15
# not what I want
ddply(my_df, .(a,b,c), function(x) c("mean"=mean(x$dat), "n"=nrow(x)))
a b c mean n 1 0 1 0 14 2 2 0 1 1 14 1 3 1 0 0 11 1 4 1 0 1 11 2 What I want: a b c mean n 1 1 * * 11 3 2 * 1 * 14 3 3 * * 1 12 3 where "*" refers to any value of the other columns.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________ FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! Check it out at http://www.inbox.com/earth