Skip to content

Calculate mean/var by ID

7 messages · liujb, Henrique Dallazuanna, Adam D. I. Kramer +3 more

#
Hello,

I have a data set that looks like this. 
ID    value
111     5
111     6
111     2
178     7
178     3
138     3
138     8
138     7
138     6
.
.
.

I'd like to calculate the mean and var for each object identified by the ID.
I can in theory just loop through the whole thing..., but is there a easier
way/command which let me calculate the mean/var by ID?

Thanks,
Julia
#
aggregate(value,list(ID=ID),mean)
aggregate(value,list(ID=ID),var)

--Adam
On Thu, 11 Sep 2008, liujb wrote:

            
#
A slight variation of what Jorge has proposed is:

    f <- function(x) c( mu=mean(x), var=var(x) )

    do.call( "rbind", tapply( df$value, df$ID, f ) )

             mu      var
   111 4.333333 4.333333
   138 6.000000 4.666667
   178 5.000000 8.000000

Regards, Adai
Jorge Ivan Velez wrote:
#
AFAIK, tapply() only works for one variable (apart from the grouping 
variable). It might be perhaps better to use split() here:

    df <- data.frame(ID = c(111, 111, 111, 178, 178, 138, 138, 138, 138),
                     value = c(5, 6, 2, 7, 3, 3, 8, 7, 6),
                     Seg = c(2, 2, 2, 4, 4, 1, 1, 1, 1) )

    df.s <- split( df, df$ID )

    out <- sapply( df.s, function(m){
                     c( mu=mean(m$value), var=var(m$value),
                        min=min(m$Seg), max=max(m$Seg) ) })
    out <- t(out)
              mu      var min max
    111 4.333333 4.333333   2   2
    138 6.000000 4.666667   1   1
    178 5.000000 8.000000   4   4

You could also have used range() here instead of calculating min and max 
separately but naming the resulting columns becomes a bit tricky.

Regards, Adai

PS: If you do a dput() on a subset of the data, you can get a simple 
reproducible example that other R users can easily read in.
Julia Liu wrote: