Skip to content
Prev 1892 / 5636 Next

[R-meta] Effect sizes for mixed-effects models

Hi Lena,

To your first question: the distinction between Brysbaert and Stevens
(2018) and Hedges (2007) has to do with estimation, rather than the
definition of the effect size. Both studies use the same definition of the
effect size parameter (assuming standardization by the total variance).
Brysbaert and Stevens assume that you are working with the results of a
fitted mixed effects model, where the variance components would be
estimated using restricted maximum likelihood (REML). In contrast, Hedges
(2007) uses moment estimators assuming a balanced design. In his notation,
S_B^2 and S_W^2 are sample variances between and within-clusters,
respectively, which are not exactly the same as the REML estimators. The (n
- 1) / n term arises because S_B^2 is an overestimate of sigma_B^2 (the
between-cluster population variance). See the explanation on p. 347 in the
section "Estimation of delta_B". In a balanced design (where all clusters
are the same size), the two approaches to calculation should yield
identical estimates of total variance, I think, and even with some
imbalance the total variance estimates (and resulting effect size
estimates) should come very close.

To your second question about how to get the degrees of freedom, yes I
think using the total number of participants is probably a good and
conservative approximation.

To your final question about comparability across between- and
within-subjects designs: comparability hinges on whether the variance
components used in the denominator of d are the same across both types of
designs. In principle, using the methods outlined in my blog post, you
should be able to define and estimate effect sizes that are comparable
across both types of designs. Of course, in practice there may be factors
that differ across the two types of designs.  For example, how the
treatment is operationalized in a within-subjects design might be different
from how it is typically operationalized in a between-subjects design. Or
the scales used to assess the outcome might differ between the two types of
designs. Thus, I would recommend approaching this issue both conceptually
and empirically. Conceptually, try to obtain effect size estimates that are
comparable in principle. Then empirically, examine whether effect sizes
differ on average according to the type of design.

James

On Fri, Dec 13, 2019 at 8:00 AM Lena Sch?fer <lenaschaefer2304 at gmail.com>
wrote: