Skip to content
Prev 547 / 5632 Next

[R-meta] Collapsing a between subject factor

This still isn't quite right. You can compute the mean and SD for the combined sample exactly:

### simulate some data
n.total <- 100
grp <- sample(1:2, size=n.total, replace=TRUE, prob=c(.2,.8))
y   <- rnorm(n.total, mean=grp, sd=2)

### means and SDs of the subgroups
ni  <- c(by(y, grp, length))
mi  <- c(by(y, grp, mean))
sdi <- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total <- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total <- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

### check that we get the right values
m.total
sd.total

This also generalizes to any number of groups. Try with:

grp <- sample(1:3, size=n.total, replace=TRUE, prob=c(.2,.6,.3))

Best,
Wolfgang