Skip to content
Prev 247491 / 398503 Next

Using summaryBy with weighted data

Dear Solomon,

On Sun, Jan 16, 2011 at 10:27 PM, Solomon Messing
<solomon.messing at gmail.com> wrote:
Yes, of course.  It has no way of knowing that the weights should also
be being broken down by group....they are not in the formula.
Ideally there would be a way to pass more than one variable to a
function (e.g., response and weights) or just an entire object
(mydata) broken down by group.  Then you would just make a wrapper
function to pass the right values to the x and w arguments of
weighted.mean.  Instead here is a somewhat hacked version:

library(doBy)
## make up some data (easier)
mydata <- data.frame(response = rnorm(100),
 group = rep(1:5, each = 20), weights = runif(100, 0, 1))

## manually compute weighted mean
tmp <- summaryBy(response*weights ~ group, data = mydata, FUN = sum)
tmp[,2] <- tmp[,2]/with(mydata, tapply(weights, group, sum))
tmp ## weighted means

## here's the 'problem', if you will, even with  +, they are passed
one at a time
summaryBy(response + weights ~ group, data = mydata, FUN = str)
summaryBy(mydata ~ group, data = mydata, FUN = str)

## here is an option using by():
xy <- by(mydata, mydata$group, function(z) weighted.mean(z$response, z$weights))
xy
## if you don't like the formatting....
data.frame(group = names(c(xy)), weighted.mean = c(xy))

HTH,

Josh