-----Original Message-----
From: ashenkin at ufl.edu
Sent: Wed, 20 Mar 2013 14:57:36 -0500
To: r-help at r-project.org
Subject: [R] summarize dataframe based on multiple cols, not their
combinations
Hi folks,
I'm trying to figure out how to get summarized data based on multiple
columns. However, instead of giving summaries for every combination of
categorical columns, I want it for each value of each categorical column
regardless of the other columns. I could do this with three different
commands, but i'm wondering if there's a more elegant way that I'm
missing. Thanks!
allie
my_df = data.frame(a = c(1,1,1,0,0,0), b=c(0,0,0,1,1,1),
c=c(1,0,1,0,1,0), dat=c(10,11,12,13,14,15))
a b c dat
1 1 0 1 10
2 1 0 0 11
3 1 0 1 12
4 0 1 0 13
5 0 1 1 14
6 0 1 0 15
# not what I want
ddply(my_df, .(a,b,c), function(x) c("mean"=mean(x$dat), "n"=nrow(x)))
a b c mean n
1 0 1 0 14 2
2 0 1 1 14 1
3 1 0 0 11 1
4 1 0 1 11 2
What I want:
a b c mean n
1 1 * * 11 3
2 * 1 * 14 3
3 * * 1 12 3
where "*" refers to any value of the other columns.