Skip to content

[R-meta] Collapsing a between subject factor

4 messages · Oliver Clark, Michael Dewey, Viechtbauer Wolfgang (STAT)

#
Hi all,

I am currently coding studies for a meta-analysis and have come across a case in which I have a set of studies in which all but one do not include sex as a between subject factor.  The reason given was unequal cell sizes, differences in visual stimuli (it is not clear what these differences are so they are unlikely to be systematic, rather an artefact)  and strength differences between men and women.

With my limited experience, I don?t see the benefit in treating these both as separate cases and was wondering whether it would make sense to merge the means and SDs for both groups and use that with the total N to calculate an effect size?

Combining the means seems relatively straightforward but I am not sure how to do the standard deviations.  I have tried averaging the variance in the following simulation to get there but must admit that I am stabbing in the dark!:
[1] TRUE
[1] FALSE

Can anyone offer any advice on the best path for this? Should I treat them as different studies, attempt to merge the means and SDs, use a different aggregation method or omit this study?

Many thanks,

Oliver Clark

PhD Student
Manchester Metropolitan University
#
Dear Oliver

You do not say whether the sample sizes are equal or not so I give the 
procedure for unequal.

For the means you need to weight by sample size

(n_1 * m_1 + n_2 * m_2) / (n_1 + n_2)

where n are sample sizes and m means

For variance you need

(n_1 * (m_1^2 + v_1) + n_2 * (m_2^2 + v_2) / (n_1 + n_2)) - m_c

where v are variances and m_c is the combined mean you got above.

I suggest double checking this with a few examples in case of 
transcription errors at my end or yours.

Michael
On 28/01/2018 21:49, Oliver Clark wrote:

  
    
  
#
Dear Michael,

Many thanks for your response.  Indeed, the sample sizes are unequal which is apparently why it was treated as two analyses.

I?ve been playing with this example others and your example below overestimates the variance.  I think this is because the means are being squared rather than the delta from the combined:

S_c <- (n_1 * (v_1 + (m_1 - m_c) ^2 ) + n_2*( v_2 + ( m_2 - m_c) ^2) ) / ( n_1 + n_2)

This still overestimates the known population variance of 4, so applying the bessel correction:

S_c_2 <- ( (n_1 - 1 )*( v_1 +( m_1 - m_c ) ^2 ) + ( n_2 - 1)*( v_2 + ( m_2 - m_c)^2) ) / ( ( n_1 + n_2) -1 )

leads to a good estimate of the combined variance.  Code:
[1] 0.001710072

Many thanks for your advice - I?d have been stuck without your input!

Best wishes,

Oliver
#
This still isn't quite right. You can compute the mean and SD for the combined sample exactly:

### simulate some data
n.total <- 100
grp <- sample(1:2, size=n.total, replace=TRUE, prob=c(.2,.8))
y   <- rnorm(n.total, mean=grp, sd=2)

### means and SDs of the subgroups
ni  <- c(by(y, grp, length))
mi  <- c(by(y, grp, mean))
sdi <- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total <- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total <- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

### check that we get the right values
m.total
sd.total

This also generalizes to any number of groups. Try with:

grp <- sample(1:3, size=n.total, replace=TRUE, prob=c(.2,.6,.3))

Best,
Wolfgang