Skip to content

summarize dataframe based on multiple cols, not their combinations

6 messages · Ista Zahn, John Kane, Alexander Shenkin +1 more

#
Hi folks,

I'm trying to figure out how to get summarized data based on multiple
columns.  However, instead of giving summaries for every combination of
categorical columns, I want it for each value of each categorical column
regardless of the other columns.  I could do this with three different
commands, but i'm wondering if there's a more elegant way that I'm
missing.  Thanks!

allie
c=c(1,0,1,0,1,0), dat=c(10,11,12,13,14,15))
a b c dat
1 1 0 1  10
2 1 0 0  11
3 1 0 1  12
4 0 1 0  13
5 0 1 1  14
6 0 1 0  15
a b c mean n
1 0 1 0   14 2
2 0 1 1   14 1
3 1 0 0   11 1
4 1 0 1   11 2

What I want:
  a b c mean n
1 1 * *   11 3
2 * 1 *   14 3
3 * * 1   12 3

where "*" refers to any value of the other columns.
#
How about

library(reshape2)
mdf.m <- melt(my_df, measure.vars=c("a", "b", "c"))
mdf.m <- mdf.m[mdf.m$value > 0, ]

ddply(mdf.m, "variable", function(x) c("mean"=mean(x$dat), "n"=nrow(x)))

?

Best,
Ista
On Wed, Mar 20, 2013 at 3:57 PM, Alexander Shenkin <ashenkin at ufl.edu> wrote:
#
Will this do?

library(plyr)
  
  ddply(my_df, .(a), summarize, mm = mean(dat), number = length(dat))

John Kane
Kingston ON Canada
____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
#
Nice, thanks Ista!
On 3/20/2013 3:18 PM, Ista Zahn wrote:
#
Thanks, John.  Your solution gives me:
a mm number
1 0 14      3
2 1 11      3

I'm looking for (and Ista found a way):
thanks,
allie
On 3/20/2013 3:24 PM, John Kane wrote:
#
Hi,
?lst1<- lapply(letters[1:3],function(i) {df1<-data.frame(my_df[i],my_df["dat"]); res<-ddply(df1,.(df1[[i]]),function(x) c("mean"=mean(x$dat),"n"=nrow(x)));names(res)[1]<-i;res<-res[res[,1]==1,]})

res1<-Reduce(function(...) merge(...,all=TRUE),lst1)
res1[is.na(res1)]<-"*"
?res1
#? mean n a b c
#1?? 11 3 1 * *
#2?? 12 3 * * 1
#3?? 14 3 * 1 *

A.K.



----- Original Message -----
From: Alexander Shenkin <ashenkin at ufl.edu>
To: r-help at r-project.org
Cc: 
Sent: Wednesday, March 20, 2013 3:57 PM
Subject: [R] summarize dataframe based on multiple cols, not their combinations

Hi folks,

I'm trying to figure out how to get summarized data based on multiple
columns.? However, instead of giving summaries for every combination of
categorical columns, I want it for each value of each categorical column
regardless of the other columns.? I could do this with three different
commands, but i'm wondering if there's a more elegant way that I'm
missing.? Thanks!

allie
c=c(1,0,1,0,1,0), dat=c(10,11,12,13,14,15))
? a b c dat
1 1 0 1? 10
2 1 0 0? 11
3 1 0 1? 12
4 0 1 0? 13
5 0 1 1? 14
6 0 1 0? 15
? a b c mean n
1 0 1 0?  14 2
2 0 1 1?  14 1
3 1 0 0?  11 1
4 1 0 1?  11 2

What I want:
? a b c mean n
1 1 * *?  11 3
2 * 1 *?  14 3
3 * * 1?  12 3

where "*" refers to any value of the other columns.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.