An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20110413/58ba74b1/attachment.pl>
dramatic difference in REML and ML random effect estimates
3 messages · Simon Chamaillé-Jammes, Ben Bolker, Douglas Bates
1 day later
Since no-one's answered this yet:
On 04/13/2011 04:21 AM, Simon Chamaill?-Jammes wrote:
Hello all, I'm comparing the duration (duration) of two different behaviour (behaviourtype) for 10 individuals (indid) in two different years (year). One of the behaviour was not observed for some individuals in the first year, so the indid and year are only partially crossed (see end of mail for data distribution). I'm fitting the model: lmer(duration~behaviourtype+(1|indid)+(1|year),REML=T,data=mydata) I have only two levels for year so modeling it as a random effect may not be that important (suitable?).
Short answer: modeling a factor with only two levels as a random effect is questionable/unsuitable -- this also explains why the REML and ML estimates of the variances are so different. In general, the differences between REML and ML estimates are on the order of (n/(n-1)). I'm mildly surprised that the individual-level variance drops all the way to (effectively) zero (the naive expectation would be that would "only" be cut in half), but in the grand scheme of things this is not an unusual result (I think). My guess would be that if you re-fit with year as a fixed effect, you will get results closer to the REML fit. On the other hand, it doesn't look like there's that much going on with behaviour type anyway ... (the estimates are quite different 0.71 vs 1.59, but the differences are well within the standard errors of either estimate).
The model summary is this:
Linear mixed model fit by REML
Formula: duration ~ behaviourtype + (1 | year) + (1 | indid)
Data: mydata
AIC BIC logLik deviance REMLdev
1708 1724 -848.8 1706 1698
Random effects:
Groups Name Variance Std.Dev.
indid (Intercept) 13.543 3.6800
year (Intercept) 17.683 4.2051
Residual 410.779 20.2677
Number of obs: 192, groups: indid, 10; year, 2
Fixed effects:
Estimate Std. Error t value
(Intercept) 32.8539 3.9592 8.298
behaviourtypeForaging 0.7121 3.0569 0.233
Correlation of Fixed Effects:
(Intr)
bhvrtypFrgn -0.449
I was quite happy with this and wanted to perform a LR test with the
model without the fixed effect (behaviourtype) - I gathered that one
should rather use the likelihood estimated via ML (and that is what is
done by the anova() function anyway), so for my 'personal interest' I
fitted the model again with REML=F:
Linear mixed model fit by maximum likelihood
Formula: duration ~ behaviourtype + (1 | year) + (1 | indid)
Data: mydata
AIC BIC logLik deviance REMLdev
1716 1733 -853.2 1706 1699
Random effects:
Groups Name Variance Std.Dev.
indid (Intercept) 7.4845e-12 2.7358e-06
year (Intercept) 5.5096e+00 2.3473e+00
Residual 4.2026e+02 2.0500e+01
Number of obs: 192, groups: indid, 10; year, 2
Fixed effects:
Estimate Std. Error t value
(Intercept) 32.228 2.814 11.453
behaviourtypeForaging 1.590 3.005 0.529
Correlation of Fixed Effects:
(Intr)
bhvrtypFrgn -0.604
The variance estimates of the random effects are VERY different from the
REML estimates.
Admittedly my understanding of the fitting process of lmer is rather
small, and doing some google work or reading Zuur et al. or Gelman &
Hill books did not improve my understanding of what is happening here (I
may have to learn to read better). In particular, my questions for you guys:
1) is my model implementation right ?
2) are differences in ML and REML estimates affecting the LR test of the
model against a 'null' model (no fixed effect) ?
Many thanks for sharing some of your most valuable resources (brain, time).
simon
PS/ This is the data distribution across behaviour, years, and the 10
individuals:
, , = year1
1 2 3 4 5 6 7 8 9 10
behaviour1 2 1 3 2 4 0 3 1 0 7
behaviour2 4 5 7 4 5 10 0 1 7 5
, , = year2
1 2 3 4 5 6 7 8 9 10
behaviour1 13 1 5 6 1 13 3 9 2 10
behaviour2 4 4 8 7 6 3 9 6 8 3
I'm happy to make mine the fortune:
Getting flamed for asking dumb questions on a public mailing list is all part of growing up and being a man/woman.
-- Michael Watson (in a discussion on whether answers on R-help should be more polite)
R-help (December 2004)
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
On Thu, Apr 14, 2011 at 8:17 AM, Ben Bolker <bbolker at gmail.com> wrote:
?Since no-one's answered this yet: On 04/13/2011 04:21 AM, Simon Chamaill?-Jammes wrote:
Hello all, I'm comparing the duration (duration) of two different behaviour (behaviourtype) for 10 individuals (indid) in two different years (year). One of the behaviour was not observed for some individuals in the first year, so the indid and year are only partially crossed (see end of mail for data distribution). I'm fitting the model: lmer(duration~behaviourtype+(1|indid)+(1|year),REML=T,data=mydata) I have only two levels for year so modeling it as a random effect may not be that important (suitable?).
?Short answer: modeling a factor with only two levels as a random effect is questionable/unsuitable -- this also explains why the REML and ML estimates of the variances are so different. ?In general, the differences between REML and ML estimates are on the order of (n/(n-1)). ?I'm mildly surprised that the individual-level variance drops all the way to (effectively) zero (the naive expectation would be that would "only" be cut in half), but in the grand scheme of things this is not an unusual result (I think).
Note that the difference in deviance between the two sets of estimates is very small, indicating the the estimate of the standard deviation of the random effects is very poorly determined.
?My guess would be that if you re-fit with year as a fixed effect, you will get results closer to the REML fit. ?On the other hand, it doesn't look like there's that much going on with behaviour type anyway ... (the estimates are quite different 0.71 vs 1.59, but the differences are well within the standard errors of either estimate).
The model summary is this: Linear mixed model fit by REML Formula: duration ~ behaviourtype + (1 | year) + (1 | indid) ? ? Data: mydata ? ?AIC ?BIC logLik deviance REMLdev ? 1708 1724 -848.8 ? ? 1706 ? ?1698 Random effects: ? Groups ? Name ? ? ? ?Variance Std.Dev. ? indid ? ?(Intercept) ?13.543 ? 3.6800 ? year ? ? (Intercept) ?17.683 ? 4.2051 ? Residual ? ? ? ? ? ? 410.779 ?20.2677 Number of obs: 192, groups: indid, 10; year, 2 Fixed effects: ? ? ? ? ? ? ? ? ? ? ? ?Estimate Std. Error t value (Intercept) ? ? ? ? ? ?32.8539 ? ? 3.9592 ? 8.298 behaviourtypeForaging ? 0.7121 ? ? 3.0569 ? 0.233 Correlation of Fixed Effects: ? ? ? ? ? ? ?(Intr) bhvrtypFrgn -0.449 I was quite happy with this and wanted to perform a LR test with the model without the fixed effect (behaviourtype) - I gathered that one should rather use the likelihood estimated via ML (and that is what is done by the anova() function anyway), so for my 'personal interest' I fitted the model again with REML=F: Linear mixed model fit by maximum likelihood Formula: duration ~ behaviourtype + (1 | year) + (1 | indid) ? ? Data: mydata ? ?AIC ?BIC logLik deviance REMLdev ? 1716 1733 -853.2 ? ? 1706 ? ?1699 Random effects: ? Groups ? Name ? ? ? ?Variance ? Std.Dev. ? indid ? ?(Intercept) 7.4845e-12 2.7358e-06 ? year ? ? (Intercept) 5.5096e+00 2.3473e+00 ? Residual ? ? ? ? ? ? 4.2026e+02 2.0500e+01 Number of obs: 192, groups: indid, 10; year, 2 Fixed effects: ? ? ? ? ? ? ? ? ? ? ? ?Estimate Std. Error t value (Intercept) ? ? ? ? ? ? 32.228 ? ? ?2.814 ?11.453 behaviourtypeForaging ? ?1.590 ? ? ?3.005 ? 0.529 Correlation of Fixed Effects: ? ? ? ? ? ? ?(Intr) bhvrtypFrgn -0.604 The variance estimates of the random effects are VERY different from the REML estimates. Admittedly my understanding of the fitting process of lmer is rather small, and doing some google work or reading Zuur et al. or Gelman & Hill books did not improve my understanding of what is happening here (I may have to learn to read better). In particular, my questions for you guys: 1) is my model implementation right ? 2) are differences in ML and REML estimates affecting the LR test of the model against a 'null' model (no fixed effect) ? Many thanks for sharing some of your most valuable resources (brain, time). simon PS/ This is the data distribution across behaviour, years, and the 10 individuals: , , ?= year1 ? ? ? ? ? ? ? ?1 2 3 4 5 6 7 8 9 10 ? ?behaviour1 ?2 1 3 2 4 0 3 1 0 7 ? ?behaviour2 ?4 ?5 ?7 ?4 ?5 10 ?0 ?1 ?7 ?5 , , ?= year2 ? ? ? ? ? ? ? ?1 2 3 4 5 6 7 8 9 10 ? ?behaviour1 13 ?1 ?5 ?6 ?1 13 ?3 ?9 ?2 10 ? ?behaviour2 ?4 ?4 ?8 ?7 ?6 ?3 ?9 ?6 ?8 ?3 I'm happy to make mine the fortune: Getting flamed for asking dumb questions on a public mailing list is all part of growing up and being a man/woman. ? ? -- Michael Watson (in a discussion on whether answers on R-help should be more polite) ? ? ? ?R-help (December 2004) ? ? ? [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models