Skip to content

Using apply to get group means

5 messages · Alan Cohen, Baptiste Auguie, Hadley Wickham +2 more

#
Hi all,

I'm trying to improve my R skills and make my programming more efficient and succinct.  I can solve the following question, but wonder if there's a better way to do it:

I'm trying to calculate mean by several variables and then put this back into the original data set as a new variable.  For example, if I were measuring weight, I might want to have each individual's weight, and also the group mean by, say, race, sex, and geographic region.  The following code works:
+   x3.mean[i]<-mean(as.numeric(x[,3][x[,1]==x[,1][i]&x[,2]==x[,2][i]]))
+   }
x1 x2 x3 x3.mean
1  A  1  1     1.5
2  B  1  2     2.0
3  C  1  3     3.5
4  A  2  4     4.0
5  B  2  5     5.5
6  C  2  6     6.0
7  A  1  2     1.5
8  B  2  6     5.5
9  C  1  4     3.5

However, I'd love to be able to do this with "apply" rather than a for-loop.  Or is there a built-in function? Any suggestions?

Also, any way to avoid the hassles with having to convert to a data frame and then again to numeric when one variable is character?

Cheers,
Alan Cohen
#
Not exactly the output you asked for, but perhaps you can consider,

library(doBy)
 > summaryBy(x3~x2+x1,data=x,FUN=mean)
the plyr package also provides similar functionality, as do the ?by, ? 
ave, and ?tapply base functions.

HTH,

baptiste
On 31 Mar 2009, at 17:09, Alan Cohen wrote:

            
_____________________________

Baptiste Augui?

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag
#
On Tue, Mar 31, 2009 at 11:31 AM, baptiste auguie <ba208 at exeter.ac.uk> wrote:
In plyr it would look like:

x1 <- rep(c("A", "B", "C"), 3)
x2 <- c(rep(1, 3), rep(2, 3), 1, 2, 1)
x3 <- c(1, 2, 3, 4, 5, 6, 2, 6, 4)
df <- data.frame(x1, x2, x3)

ddply(df, .(x1, x2), transform, x3.mean = mean(x3))

Note how I created the data frame - only use cbind if you want a
matrix (i.e. all the columns have the same type)

Hadley
#
A different solution (using aggregate for the table of means and merge 
for  adding it to the dataframe):

x1<-rep(c("A","B","C"),3)
x2<-c(rep(1,3),rep(2,3),1,2,1)
x3<-c(1,2,3,4,5,6,2,6,4)
x<-data.frame(x1,x2,x3) #here using data.frame the x1 variable is directly converted to factor


x3means <- aggregate(x$x3, by=list(x$x1), FUN="mean")
merge(x, x3means, by.x="x1", by.y="Group.1")


Ciao,
domenico
Alan Cohen wrote:
#
That is precisely the reason for the existence of the ave function.  
Using Wickham's example:

 > x1 <- rep(c("A", "B", "C"), 3)
 > x2 <- c(rep(1, 3), rep(2, 3), 1, 2, 1)
 > x3 <- c(1, 2, 3, 4, 5, 6, 2, 6, 4)
 > df <- data.frame(x1, x2, x3)
 > df$grpx3 <- ave(df$x3, list(x1,x2))
 > df
   x1 x2 x3 grpx3
1  A  1  1   1.5
2  B  1  2   2.0
3  C  1  3   3.5
4  A  2  4   4.0
5  B  2  5   5.5
6  C  2  6   6.0
7  A  1  2   1.5
8  B  2  6   5.5
9  C  1  4   3.5

Note that the default function is mean() but other functions could be  
specified.