How do you report lmer results?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 07/27/2011 04:01 AM, Luke Duncan wrote:
Dear R-Gurus I am a PhD student from South Africa working on chimpanzee behaviour. I am looking at patterns of shade utilization and am using generalized linear mixed models to examine the effects of various factors on whether chimpanzees choose to spend time in the sun or shade.
OK. Are these binary (sun vs shade) outcomes? I
realise that the lme4 package and the outputs of the lmer functions have been discussed ad nauseum but I have been reading through many of them and am finding it all extremely confusing. I have used programs like Statistica to run glm's with no random factors but now that I have to include random effects, this is no longer an option. Thus I have turned to R (and hence I am a complete R virgin).
This is fine; you should mention that you previously tried on r-help (this is a more appropriate list anyway).
What I would like to know is the following. What is the accepted general consensus on how to report the outputs of a lmer model? What is the currently accepted method for determining whether fixed effect parameters are significant in predicting the outcomes of the model (LHR, AIC, Wald X^2...?)? While I recognise that the "Pr(>|z|)" value is not a definitive p-value (rather an approximation), can one treat it loosely as an 'estimated' p-value?
Hard to say. You could try searching google scholar for references to "lmer" or "lme4", for a start, although there aren't a lot of hits there. I would recommend Bolker et al _Trends in Ecology and Evolution_ 2008, too (of course); it might also be useful to look in Zuur's book on mixed models.
My model comprises 2 categorical predictor variables (Time of day: 'Time'; Available amount of shade, coded as a three-way classification: 'Tertile'), two continuous predictor variables (maximum temperature: 'Max'; minimum temperature: 'Min') and three random effects (Which experimental dataset the data were derived from: 'Exp'; Which individual chimpanzee was observed: 'Indiv'; Which area/zone of the enclosure they occupied at the time of observation: 'Zone'). These are the outputs that I have generated thus far using LHR testing. How should I be interpretting and reporting these outputs? Generalized linear mixed model fit by the Laplace approximation Formula: prop ~ Time + Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone) + Max + Min Data: sdata AIC BIC logLik deviance 215.5 259 -95.77 191.5 Random effects: Groups Name Variance Std.Dev. Zone (Intercept) 2.6596e-01 5.1571e-01 Indiv (Intercept) 0.0000e+00 0.0000e+00 Exp (Intercept) 2.9021e-11 5.3871e-06
One thing that's clear here is that your model is a bit overfitted -- you are getting zero or near-zero variances for both 'indiv' and 'exp'. That doesn't necessarily mean anything is wrong, but you might want to test whether the model performs differently if you drop these random effects, or whether it gives significantly different results with different combinations of random effects (again, unlikely but worth checking). It's also not surprising that you get zero variance for Exp, with only two levels. The numbers of random-effects levels here are very small -- 7 or 8 is probably the *minimum* you could expect to work. Opinions differ (see <http://glmm.wikidot.com/faq>), but I would suggest you make Exp a fixed factor ...
Number of obs: 276, groups: Zone, 8; Indiv, 7; Exp, 2
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.15725 1.58304 -1.363 0.17297
Time11h00 0.96362 0.40956 2.353 0.01863 *
Time12h00 1.57906 0.49033 3.220 0.00128 **
Time13h00 1.58951 0.40705 3.905 9.43e-05 ***
Time14h00 1.07939 0.53876 2.003 0.04513 *
TertileLOW -1.40906 0.53761 -2.621 0.00877 **
TertileMEDIUM -1.24862 0.57396 -2.175 0.02960 *
Max 0.10122 0.08611 1.175 0.23985
Min 0.13439 0.10292 1.306 0.19162
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Correlation of Fixed Effects:
(Intr) Tm1100 Tm1200 Tm1300 Tm1400 TrtLOW TMEDIU Max
Time11h00 0.056
Time12h00 0.258 0.447
Time13h00 0.115 0.510 0.486
Time14h00 -0.049 0.318 0.276 0.370
TertileLOW -0.146 -0.119 -0.215 -0.236 -0.096
TertlMEDIUM -0.128 0.024 -0.145 -0.155 -0.224 0.707
Max -0.914 -0.162 -0.277 -0.198 -0.084 -0.025 -0.022
Min 0.178 0.074 -0.023 0.105 0.244 -0.101 -0.077 -0.463
anova(m1,m2)
Data: sdata Models: m2: prop ~ Time + (1 | Exp) + (1 | Indiv) + (1 | Zone) + Max + Min m1: prop ~ Time + Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone) + m1: Max + Min Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) m2 10 216.72 252.92 -98.359 m1 12 215.55 258.99 -95.773 5.1721 2 0.07532 . --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
anova(m1,m3)
Data: sdata Models: m3: prop ~ Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone) + Max + m3: Min m1: prop ~ Time + Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone) + m1: Max + Min Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) m3 8 226.11 255.08 -105.057 m1 12 215.55 258.99 -95.773 18.567 4 0.0009556 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
anova(m1,m4)
Data: sdata Models: m4: prop ~ Time + Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone) + m4: Min m1: prop ~ Time + Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone) + m1: Max + Min Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) m4 11 214.81 254.64 -96.407 m1 12 215.55 258.99 -95.773 1.2672 1 0.2603
anova(m1,m5)
Data: sdata Models: m5: prop ~ Time + Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone) + m5: Max m1: prop ~ Time + Tertile + (1 | Exp) + (1 | Indiv) + (1 | Zone) + m1: Max + Min Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) m5 11 215.22 255.05 -96.613 m1 12 215.55 258.99 -95.773 1.6792 1 0.195 As I understand this output, the only significant predictor in the model appears to be time of day. But, I don't really know how this should be reported. Can you point me to some papers or examples where lmer outputs have been reported formally? Any help that you could offer would be MOST appreciated.
I would try drop1() [works with version ...-40 of lme4, I believe] for more convenient output. What you are doing is reasonable, although likelihood ratio test statistics are probably going to be somewhat liberal (anticonservative) for "small" sample sizes. *If* all the predictors were represented equally in all the random-effect levels (i.e. individuals, zones, experimental blocks all had similar levels of Tertile, Min/Max temp, time) then this would be close to a randomized-block design, and in the classical sense you would have a fairly large residual N (as opposed to a nested design, where at least some of the treatments don't differ within blocks). LRT is probably appropriate with a caveat. If you are paranoid, use parametric bootstrapping (see the examples for ?"simulate-mer").
Sincerely (in desperation) Luke Duncan PhD Candidate School of Animal, Plant and Environmental Sciences University of the Witwatersrand Johannesburg, South Africa +27 11 717 6452
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk4xZjIACgkQc5UpGjwzenM5QACfSfnnneERl3Q4NN936N8i2/cl TM8AniEQeCArTkuSf18p8qoTepFdEVQC =Xc5e -----END PGP SIGNATURE-----