Skip to content
Prev 18182 / 20628 Next

Most principled reporting of mixed-effect model regression coefficients

Dear James,
No, I don't think that standardizing your data before model fitting is a problem, and this indeed gives you comparable estimates (in the sense of "being on the same scale"). I think lme4 sometimes even gives a message that recommends rescaling predictors that are on very different scales.

The above mentioned comparison only refers to the summary output. You could also compute "classical" Anova tables with effects sizes, but this is what is problematic in mixed models, afaik. In the below example, "summary()" gives you "effect sizes" (estimates) for the interaction at each level of Species, while "anova()" does not. For the Anova-table it is difficult to calculate reliable effect sizes like eta-squared etc. in mixed models, but if you standardize your data before model fitting, the summary gives you comparable estimates. I think this is also a matter of terminology what people call "effect size", this also differs between fields/disciplines.
I think the problem is the "variance decomposition" for more complex models like mixed models, in particular for other link-functions/families. That is why parts of the variance components in mixed models are not "exact", but rather approximated (see Nakagawa, S., Johnson, P. C. D., & Schielzeth, H. (2017)  doi: 10.1098/rsif.2017.0213), which is implemented in the "r2()" function from the performance-package)
Note that when reporting AIC, in particular for model comparison, models need to be refitted with ML, not REML. "aic()" in lme4, however, does this automatically for you.

Best
Daniel

data(iris)
m <- lm(Sepal.Length ~ Species * Sepal.Width, data = iris)
summary(m)
#> 
#> Call:
#> lm(formula = Sepal.Length ~ Species * Sepal.Width, data = iris)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.26067 -0.25861 -0.03305  0.18929  1.44917 
#> 
#> Coefficients:
#>                               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                     2.6390     0.5715   4.618 8.53e-06 ***
#> Speciesversicolor               0.9007     0.7988   1.128    0.261    
#> Speciesvirginica                1.2678     0.8162   1.553    0.123    
#> Sepal.Width                     0.6905     0.1657   4.166 5.31e-05 ***
#> Speciesversicolor:Sepal.Width   0.1746     0.2599   0.672    0.503    
#> Speciesvirginica:Sepal.Width    0.2110     0.2558   0.825    0.411    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.4397 on 144 degrees of freedom
#> Multiple R-squared:  0.7274, Adjusted R-squared:  0.718 
#> F-statistic: 76.87 on 5 and 144 DF,  p-value: < 2.2e-16
anova(m)
#> Analysis of Variance Table
#> 
#> Response: Sepal.Length
#>                      Df Sum Sq Mean Sq  F value    Pr(>F)    
#> Species               2 63.212 31.6061 163.4417 < 2.2e-16 ***
#> Sepal.Width           1 10.953 10.9525  56.6379 5.221e-12 ***
#> Species:Sepal.Width   2  0.157  0.0786   0.4064    0.6668    
#> Residuals           144 27.846  0.1934                       
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1



-----Urspr?ngliche Nachricht-----
Von: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-project.org] Im Auftrag von Ades, James
Gesendet: Montag, 17. Februar 2020 05:32
An: Maarten Jung <Maarten.Jung at mailbox.tu-dresden.de>
Cc: r-sig-mixed-models at r-project.org
Betreff: Re: [R-sig-ME] Most principled reporting of mixed-effect model regression coefficients

Thanks, Maarten. So I was planning on reporting R^2 (along with AIC) for the overall model fit, not for each predictor, since the regression coefficients themselves give a good indication of relationship (though I wasn't aware that R^2 is "riddled with complications") Is Henrik only saying this only with regard to LMMs and GLMMs?

When you say "there is no agreed upon way to calculate effect sizes" I'm a little confused. I read through your stack exchange posting, but Henrik's answer refers to standardized effect size. You write, later down, "Whenever possible, we report unstandardized effect sizes which is in line with general recommendation of how to report effect sizes"

I'm also working on a systematic review where there's disagreement over whether effect sizes should be standardized, but it does seem that yield any kind of meaningful comparison, effect sizes would have to be standardized. I don't usually report standardized effect sizes...however, there are times when I z-score IVs to put them on the same scale, and I guess the output of that would be a standardized effect size. I wasn't aware of push back on that practice. What issues would arise from this?

I learned that mixed models are used predominantly for overall predictions vs individual coefficients, but I still was under the impression that one could derive effect sizes from predictor variables, and that this was largely sound. Am I incorrect?

In this particular study, there are four timepoints with 1286 students, though at each timepoint, there are roughly 1000 students. All students complete the same executive function tasks, so in that regard, there isn't really a formal factorial design at play, though there are multiple independent variables.

Best,

James