[R-meta] Effect sizes for mixed-effects models

Dear James, 

Thank you so much for the detailed response! I apologize for the delay in getting back to you; my graduate school applications got in the way of this. Your suggestion is exactly what we have been looking for and your blogpost has been very informative. I do have a couple of follow-up questions and would be curious to hear what you think:

 Calculating Cohen?s d and its variance for mixed-effects models

Initially, we planned to follow Brysbaert and Stevens' (2018) suggestion to calculate Cohen?s d for mixed-effects models using:

d = difference in means / sqrt(sum of all variance components).

Hedges (2007) proposes three approaches toward scaling the treatment effect in mixed-effects models, namely by standardizing the mean difference by the total variance (i.e., sum of the within- and between-cluster components), the within-cluster variance, or the between-cluster variance. Intuitively, I understood that Brysbaert and Stevens? approach also uses the total variance to scale the treatment effect since *all* variance components are summed up. However, Hedges seems to use another formula for deriving dTotal, namely:

dT = difference in means / sqrt (between-cluster components + ((n-1) / n) * within-cluster components).      

Can you help me understand in which cases it would make sense to scale the difference in means by sqrt(sum of all variance components) and in which cases it would be more reasonable to use sqrt (between-cluster components + ((n-1) / n) * within-cluster components)?

You also provided information on an alternative approach towards calculating the variance of Cohen?s d using: 

Vd = (SEb / S)^2 + d^2 / (2 v)

For our mixed-effects models, I could derive SEb directly from the lme4 output, and I could substitute the standardizer used for calculating Cohen?s d for S (sqrt(sum of all variance components) or sqrt (between-cluster components + ((n-1) / n) * within-cluster components). In an effort to be as conservative as possible, I would use the number of participants as the degrees of freedom (v). Does this make sense?

Comparability of effect sizes derived from between- and within-subjects designs

Finally, I wonder to which extent the alternative formulas suggested in the blogpost allow for comparison across different experimental designs. In our meta-analysis, we aim at including effect sizes derived from between- and within-subjects designs. To be able to synthesize the results from both types of designs in one analysis, we make sure to meet the three criteria outlined in Morris and DeShon (2002): 1) all effect sizes were ultimately transformed into a common metric (between-subjects metric); 2) the same effect of interest was measured in both types of studies; and 3) sampling variances for all effect sizes were estimated based on the original  design of the study (Table 2). Comparing the variance formulas provided in the blogpost to the ones provided in Morris & DeShon, it seems like the latter ones are slightly larger (and thus more conservative, which seems fine). However, I am uncertain about mixing the Morris & DeShon formulas for within- and between-subjects designs (to allow for comparison) with the alternative formulas you provided for calculating Cohen?s d and its respective variance for mixed-effects models. Do you think this might cause any problems for the comparability of our effect sizes? I wonder whether you have some intuition on whether effect sizes derived using the alternative formulas proposed in the blogpost can be across different study designs.

Thank you so much for your help. Your time and effort are very much appreciated!

Best wishes, 

Lena Schaefer

On behalf of a collaborative team that additionally includes Leah Somerville (head of the Affective Neuroscience and Development Laboratory), Katherine Powers (former postdoc in the Affective Neuroscience and Development Laboratory) and Bernd Figner (Radboud University).