Dear list, I?m hoping someone know the current practice or wisdom regarding calculating (standardised) effect sizes of fixed effects in a mixed model (I fit all mine with lmer). By effect size I mean something akin to a cohen?s d type value. I?ve followed this list for the past few years and my understanding is that there is no easy way to do this, because of working out the degrees of freedom of the random structure (I hope I?ve understood that correctly). However, in searching the list archives for the past two years I have seen some discussion about it, in October 2019, and I have also seen that the emmeans package has a function called eff_size (though calculating the required values for the parameters seems like it could be prone to error for myself), so I thought I would ask: is there a current standard practice for calculating effect sizes of fixed effects? I?m not doing this in response to reviewer comments, but I anticipate I will get a comment like this for something I want to submit soon ? Best, Amie ------------------ Dr. Amie Fairs Post-doctorant Aix-Marseille Universit? Laboratoire Parole et Langage (LPL) | CNRS UMR 7309 | 5 Avenue Pasteur | 13100 Aix-en-Provence Email : amie.fairs at univ-amu.fr<mailto:amie.fairs at univ-amu.fr> While I may send this email outside of typical working hours, I have no expectation to receive an email outside of your typical hours.
Calculating effect sizes of fixed effects in lmer
6 messages · Wolfgang Viechtbauer, James Pustejovsky, FAIRS Amie +1 more
Dear Amie, I would say the answer to "is there a current standard practice for calculating effect sizes of fixed effects?" is No. The difficulty is how to standardize the predictors/outcome. In standard regression models, the 'standardized coefficients' (often referred to as 'beta') can be easily obtained by standardizing the outcome and the predictor variables before fitting the model (which is equivalent to computing beta = b * sd(x) / sd(y), the equation usually shown in textbooks for computing standardized regression coefficients). An example: x1 <- c(2,4,3,5,6,7,4,6) x2 <- c(0,0,0,0,0,1,1,1) y <- c(4,3,2,4,5,4,7,4) res <- lm(y ~ x1 + x2) coef(res)[2] * sd(x1) / sd(y) res <- lm(scale(y) ~ I(scale(x1)) + I(scale(x2))) coef(res)[2] One could in principle do the same in mixed-effects models, but such models are often used for data that have some kind of multilevel structure (e.g., repeated measurements within subjects and/or subjects nested within some higher-level grouping variable such as pupils nested within schools). We then try to account for this structure by modeling different sources of variability (e.g., variance between schools versus variance between pupils). Computing sd(y) and sd(x1) (as above) would ignore this structure and just lumps everything together. Of course there are all kinds of proposals out there how one could do this more 'correctly' in the context of such models, but I don't think there is a general agreement on how this should be done. Indeed, reviewers often ask authors to report some kind of 'effect size'. Nothing wrong with that, but unfortunately a lot of people interpret the term 'effect size' to refer to some kind of *standardized* measure. To me, that is an overly narrow definition of what an effect size is. For example, the (unstandardized) difference in means between two groups (e.g., treated versus control) is an effect size. And so is an unstandardized regression coefficient. Standardized effects sizes are a crutch we use for example in meta-analysis to make results from different studies more comparable to each other because unstandardized coefficients / effects are only directly comparable if the units of y and x are the same across studies. But for interpreting the results from a single study, an unstandardized effect size is perfectly fine as long as we start to have an appreciation for the units of the scales that we work with. If I tell an experienced clinician that some treatment for depression on average leads to a 10 point reduction on the Beck Depression Inventory, they should be able to understand what that means and how clinically relevant that is. Or to use Cohen's own words (from his infamous 1994 paper 'The earth is round (p < .05)'): "To work constructively with 'raw' regression coefficients and confidence intervals, psychologists have to start respecting the units they work with, or develop measurement units they can respect enough so that researchers in a given field or subfield can agree to use them. In this way, there can be hope that researchers' knowledge can be cumulative. (p. 1001). I went on a bit of a rant there towards the end, but this insistence on standardized effect sizes is a bit of a pet-peeve of mine. Best, Wolfgang
-----Original Message----- From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of FAIRS Amie Sent: Thursday, 24 September, 2020 10:49 To: r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Calculating effect sizes of fixed effects in lmer Dear list, I?m hoping someone know the current practice or wisdom regarding calculating (standardised) effect sizes of fixed effects in a mixed model (I fit all mine with lmer). By effect size I mean something akin to a cohen?s d type value. I?ve followed this list for the past few years and my understanding is that there is no easy way to do this, because of working out the degrees of freedom of the random structure (I hope I?ve understood that correctly). However, in searching the list archives for the past two years I have seen some discussion about it, in October 2019, and I have also seen that the emmeans package has a function called eff_size (though calculating the required values for the parameters seems like it could be prone to error for myself), so I thought I would ask: is there a current standard practice for calculating effect sizes of fixed effects? I?m not doing this in response to reviewer comments, but I anticipate I will get a comment like this for something I want to submit soon ? Best, Amie ------------------ Dr. Amie Fairs Post-doctorant Aix-Marseille Universit? Laboratoire Parole et Langage (LPL) | CNRS UMR 7309 | 5 Avenue Pasteur | 13100 Aix-en-Provence Email : amie.fairs at univ-amu.fr<mailto:amie.fairs at univ-amu.fr> While I may send this email outside of typical working hours, I have no expectation to receive an email outside of your typical hours.
Dear Wolfgang, Thank you so much for your comprehensive reply! I really appreciate it. I agree with you that interpreting unstandardised effects within a single study, like the difference between two conditions, is a good way to go for interpretation. In the past when working with RT data I've mostly thought along the lines of having an effect of a certain number of milliseconds, and I can evaluate for myself whether or not my effect is bigger, smaller, or in line with the general size of that effect in the literature. However, now I'm working with EEG data and I think the hang up with effect sizes is more to do with the (previously?) standard way of analysis of this data being ANOVA, where partial eta squared values are given, and so now researchers (and reviewers) in this area are used to seeing this measure of an effect that isn't a coefficient. Somewhat relatedly, practically all experiments I carry out are repeated measures, and I use a summary function called summarySEwithin when calculating means/SDs/CIs for descriptive purposes (I can't remember which package it is in right now), which should take into account that I have multiple trials which belong to the same individual (based on the documentation). If my mixed model only has participants and items modelled as random intercepts, with no random slopes and nothing nested (whether it is in the world is a different question but at least in the model there is this simplified structure) is it possible to then calculate SDs for standardised effects based on how it is would be done using something like summarySEwithin? I won't actually do this, I'm just curious about whether this would be a strategy. I'm mainly reading from your email though that standardised effect sizes aren't easy to calculate or agreed upon for mixed models ? Best, Amie ------------------ Dr. Amie Fairs Post-doctorant Aix-Marseille Universit? Laboratoire Parole et Langage (LPL) | CNRS UMR 7309 | 5 Avenue Pasteur | 13100 Aix-en-Provence Email?: amie.fairs at univ-amu.fr While I may send this email outside of typical working hours, I have no expectation to receive an email outside of your typical hours. -----Original Message----- From: Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> Sent: 24 September 2020 11:45 To: FAIRS Amie <amie.FAIRS at univ-amu.fr>; r-sig-mixed-models at r-project.org Subject: RE: Calculating effect sizes of fixed effects in lmer Dear Amie, I would say the answer to "is there a current standard practice for calculating effect sizes of fixed effects?" is No. The difficulty is how to standardize the predictors/outcome. In standard regression models, the 'standardized coefficients' (often referred to as 'beta') can be easily obtained by standardizing the outcome and the predictor variables before fitting the model (which is equivalent to computing beta = b * sd(x) / sd(y), the equation usually shown in textbooks for computing standardized regression coefficients). An example: x1 <- c(2,4,3,5,6,7,4,6) x2 <- c(0,0,0,0,0,1,1,1) y <- c(4,3,2,4,5,4,7,4) res <- lm(y ~ x1 + x2) coef(res)[2] * sd(x1) / sd(y) res <- lm(scale(y) ~ I(scale(x1)) + I(scale(x2))) coef(res)[2] One could in principle do the same in mixed-effects models, but such models are often used for data that have some kind of multilevel structure (e.g., repeated measurements within subjects and/or subjects nested within some higher-level grouping variable such as pupils nested within schools). We then try to account for this structure by modeling different sources of variability (e.g., variance between schools versus variance between pupils). Computing sd(y) and sd(x1) (as above) would ignore this structure and just lumps everything together. Of course there are all kinds of proposals out there how one could do this more 'correctly' in the context of such models, but I don't think there is a general agreement on how this should be done. Indeed, reviewers often ask authors to report some kind of 'effect size'. Nothing wrong with that, but unfortunately a lot of people interpret the term 'effect size' to refer to some kind of *standardized* measure. To me, that is an overly narrow definition of what an effect size is. For example, the (unstandardized) difference in means between two groups (e.g., treated versus control) is an effect size. And so is an unstandardized regression coefficient. Standardized effects sizes are a crutch we use for example in meta-analysis to make results from different studies more comparable to each other because unstandardized coefficients / effects are only directly comparable if the units of y and x are the same across studies. But for interpreting the results from a single study, an unstandardized effect size is perfectly fine as long as we start to have an appreciation for the units of the scales that we work with. If I tell an experienced clinician that some treatment for depression on average leads to a 10 point reduction on the Beck Depression Inventory, they should be able to understand what that means and how clinically relevant that is. Or to use Cohen's own words (from his infamous 1994 paper 'The earth is round (p < .05)'): "To work constructively with 'raw' regression coefficients and confidence intervals, psychologists have to start respecting the units they work with, or develop measurement units they can respect enough so that researchers in a given field or subfield can agree to use them. In this way, there can be hope that researchers' knowledge can be cumulative. (p. 1001). I went on a bit of a rant there towards the end, but this insistence on standardized effect sizes is a bit of a pet-peeve of mine. Best, Wolfgang
-----Original Message----- From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of FAIRS Amie Sent: Thursday, 24 September, 2020 10:49 To: r-sig-mixed-models at r-project.org Subject: [R-sig-ME] Calculating effect sizes of fixed effects in lmer Dear list, I?m hoping someone know the current practice or wisdom regarding calculating (standardised) effect sizes of fixed effects in a mixed model (I fit all mine with lmer). By effect size I mean something akin to a cohen?s d type value. I?ve followed this list for the past few years and my understanding is that there is no easy way to do this, because of working out the degrees of freedom of the random structure (I hope I?ve understood that correctly). However, in searching the list archives for the past two years I have seen some discussion about it, in October 2019, and I have also seen that the emmeans package has a function called eff_size (though calculating the required values for the parameters seems like it could be prone to error for myself), so I thought I would ask: is there a current standard practice for calculating effect sizes of fixed effects? I?m not doing this in response to reviewer comments, but I anticipate I will get a comment like this for something I want to submit soon ? Best, Amie ------------------ Dr. Amie Fairs Post-doctorant Aix-Marseille Universit? Laboratoire Parole et Langage (LPL) | CNRS UMR 7309 | 5 Avenue Pasteur | 13100 Aix-en-Provence Email : amie.fairs at univ-amu.fr<mailto:amie.fairs at univ-amu.fr> While I may send this email outside of typical working hours, I have no expectation to receive an email outside of your typical hours.
Hi Amie, I agree very much with Wolfgang's perspective that one would ideally use outcomes such that unstandardized effects can be interpreted directly. If one does have to fall back on standardized effect sizes, there's a further question of what metric to use. Researchers often jump immediately to standardized mean differences, but there are certainly other possibilities, such as log response ratios for outcomes that are measured on ratio scales. All that said, there has been a fair amount of work on standardized mean difference effect sizes for certain types of research designs that would usually be analyzed with multi-level models. A sampling (including some of my own): - Hedges, L. V. (2007). Effect sizes in cluster-randomized designs. *Journal of Educational and Behavioral Statistics*, *32*(4), 341-370. - Hedges, L. V. (2011). Effect sizes in three-level cluster-randomized experiments. *Journal of Educational and Behavioral Statistics*, *36*(3), 346-380. - Pustejovsky, J. E., Hedges, L. V., & Shadish, W. R. (2014). Design-comparable effect sizes in multiple baseline designs: A general modeling framework. *Journal of Educational and Behavioral Statistics*, *39*(5), 368-393. - Stapleton, L. M., Pituch, K. A., & Dion, E. (2015). Standardized effect size measures for mediation analysis in cluster-randomized trials. *The Journal of Experimental Education*, *83*(4), 547-582. - Feingold, A. (2009). Effect sizes for growth-modeling analysis for controlled clinical trials in the same metric as for classical analysis. *Psychological Methods*, *14*(1), 43. One of my students and I have also developed an R package for estimating standardized mean differences from multilevel models fitted with nlme::lme() https://CRAN.R-project.org/package=lmeInfo Kind Regards, James
Dear James, Thank you so much ! I?ll check out all the references and your R package. Best, Amie ------------------ Dr. Amie Fairs Post-doctorant Aix-Marseille Universit? Laboratoire Parole et Langage (LPL) | CNRS UMR 7309 | 5 Avenue Pasteur | 13100 Aix-en-Provence Email : amie.fairs at univ-amu.fr<mailto:amie.fairs at univ-amu.fr> While I may send this email outside of typical working hours, I have no expectation to receive an email outside of your typical hours. From: James Pustejovsky <jepusto at gmail.com> Sent: 24 September 2020 16:58 To: FAIRS Amie <amie.FAIRS at univ-amu.fr> Cc: Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl>; r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Calculating effect sizes of fixed effects in lmer Hi Amie, I agree very much with Wolfgang's perspective that one would ideally use outcomes such that unstandardized effects can be interpreted directly. If one does have to fall back on standardized effect sizes, there's a further question of what metric to use. Researchers often jump immediately to standardized mean differences, but there are certainly other possibilities, such as log response ratios for outcomes that are measured on ratio scales. All that said, there has been a fair amount of work on standardized mean difference effect sizes for certain types of research designs that would usually be analyzed with multi-level models. A sampling (including some of my own): * Hedges, L. V. (2007). Effect sizes in cluster-randomized designs. Journal of Educational and Behavioral Statistics, 32(4), 341-370. * Hedges, L. V. (2011). Effect sizes in three-level cluster-randomized experiments. Journal of Educational and Behavioral Statistics, 36(3), 346-380. * Pustejovsky, J. E., Hedges, L. V., & Shadish, W. R. (2014). Design-comparable effect sizes in multiple baseline designs: A general modeling framework. Journal of Educational and Behavioral Statistics, 39(5), 368-393. * Stapleton, L. M., Pituch, K. A., & Dion, E. (2015). Standardized effect size measures for mediation analysis in cluster-randomized trials. The Journal of Experimental Education, 83(4), 547-582. * Feingold, A. (2009). Effect sizes for growth-modeling analysis for controlled clinical trials in the same metric as for classical analysis. Psychological Methods, 14(1), 43. One of my students and I have also developed an R package for estimating standardized mean differences from multilevel models fitted with nlme::lme() https://CRAN.R-project.org/package=lmeInfo Kind Regards, James
Dear Amie, as additional comment to what has been said so far, I'd like to point to this forum post, which describes why it is difficult to get effect sizes like eta squared etc. from mixed models: https://afex.singmann.science/forums/topic/compute-effect-sizes-for-mixed-objects#post-295 Standardized coefficients are one possibility to report some kind of "effect size". The most accurate way would be standardizing the data before fitting the model (in particular when interaction terms are involved). Although I agree that having the "raw", unstandardized coefficients may provide a more intuitive interpretation, standardizing is sometimes even required just due to problem when fitting the model (like convergence issues). Beyond that, you can - always having the caveats (especially) for mixed models in mind! - compute effect sizes like eta squared etc., and standardized coefficients with different methods of standardizing (posthoc as described by Wolfgang, or "refitting" the model on standardized version of the data) with the "effectsize" package: https://cran.r-project.org/package=effectsize There is also a dedicated webpage: https://easystats.github.io/effectsize/ Furthermore, the package just recently implemented a function for "pseudo-standardization" of parameters in mixed models. This approach addresses the issue raised by Wolfgang that mixed models have different sources of variability, and thus sd(y) would not properly account for this. Hope this helps. Best wishes Daniel -----Urspr?ngliche Nachricht----- Von: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> Im Auftrag von FAIRS Amie Gesendet: Donnerstag, 24. September 2020 17:01 An: James Pustejovsky <jepusto at gmail.com> Cc: r-sig-mixed-models at r-project.org Betreff: Re: [R-sig-ME] Calculating effect sizes of fixed effects in lmer Dear James, Thank you so much ! I?ll check out all the references and your R package. Best, Amie ------------------ Dr. Amie Fairs Post-doctorant Aix-Marseille Universit? Laboratoire Parole et Langage (LPL) | CNRS UMR 7309 | 5 Avenue Pasteur | 13100 Aix-en-Provence Email : amie.fairs at univ-amu.fr<mailto:amie.fairs at univ-amu.fr> While I may send this email outside of typical working hours, I have no expectation to receive an email outside of your typical hours. From: James Pustejovsky <jepusto at gmail.com> Sent: 24 September 2020 16:58 To: FAIRS Amie <amie.FAIRS at univ-amu.fr> Cc: Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl>; r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Calculating effect sizes of fixed effects in lmer Hi Amie, I agree very much with Wolfgang's perspective that one would ideally use outcomes such that unstandardized effects can be interpreted directly. If one does have to fall back on standardized effect sizes, there's a further question of what metric to use. Researchers often jump immediately to standardized mean differences, but there are certainly other possibilities, such as log response ratios for outcomes that are measured on ratio scales. All that said, there has been a fair amount of work on standardized mean difference effect sizes for certain types of research designs that would usually be analyzed with multi-level models. A sampling (including some of my own): * Hedges, L. V. (2007). Effect sizes in cluster-randomized designs. Journal of Educational and Behavioral Statistics, 32(4), 341-370. * Hedges, L. V. (2011). Effect sizes in three-level cluster-randomized experiments. Journal of Educational and Behavioral Statistics, 36(3), 346-380. * Pustejovsky, J. E., Hedges, L. V., & Shadish, W. R. (2014). Design-comparable effect sizes in multiple baseline designs: A general modeling framework. Journal of Educational and Behavioral Statistics, 39(5), 368-393. * Stapleton, L. M., Pituch, K. A., & Dion, E. (2015). Standardized effect size measures for mediation analysis in cluster-randomized trials. The Journal of Experimental Education, 83(4), 547-582. * Feingold, A. (2009). Effect sizes for growth-modeling analysis for controlled clinical trials in the same metric as for classical analysis. Psychological Methods, 14(1), 43. One of my students and I have also developed an R package for estimating standardized mean differences from multilevel models fitted with nlme::lme() https://CRAN.R-project.org/package=lmeInfo Kind Regards, James _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models -- _____________________________________________________________________ Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de Vorstandsmitglieder: Prof. Dr. Burkhard G?ke (Vorsitzender), Joachim Pr?l?, Prof. Dr. Blanche Schwappach-Pignataro, Marya Verdel _____________________________________________________________________ SAVE PAPER - THINK BEFORE PRINTING