Interpretation of lmer output in R

Douglas Bates · 2011-02-21T20:06:48Z

On Mon, Feb 21, 2011 at 8:26 AM, Julia Sommerfeld wrote: > Dear Douglas and list member, > > Thank you heaps for your answers. The interpretation of the summary output > (lmer) is becoming much clearer now. I have to admit I had a slightly (not > to say HUGE) different idea of the summary output. > > But a few questions still remain... > > I have tried out the suggestions with the following results: > > 1. I tested if "Sex" is an important factor in the model: >

Douglas Bates

Mon, Feb 21, 2011 12:06 PM

On Mon, Feb 21, 2011 at 8:26 AM, Julia Sommerfeld

<Julia.Sommerfeld at utas.edu.au> wrote:

So I would conclude that even though sex was recorded it turns out
that it is not a significant predictor of the probability of site
fidelity and I would omit that term from the model.  As Ben described,
he would not be in favor of this approach because it verges on "data
snooping".  One can argue either way and, in this case it wouldn't
make much difference in the final conclusion whether or not the Sex
term is included.

So if I needed to quote a p-value for the BreedSuc factor this is what
I would quote.

The alternative is to fit a model of the form

fm2a <- lmer(SameSite ~ 1 + Sex + (1|Bird), family="binomial")

and compare it to the original model, fm, using

anova(fm2a, fm)

The general idea of testing whether BreedSuc makes a significant
contribution to predicting the probability of site fidelity is to fit
a model with the term and then fit the model without the term and
compare the quality of the fits.  To me the most sensible way to
compare the quality of the fits is to consider the likelihood ratio.
The model with the term will always do better than the one without the
term - the question is, "Is it significantly better?".  One way to
answer that question is to convert the likelihood ratio test (LRT)
statistic to a probability or p-value using the result that, under the
null hypothesis (that the term does not make a significant
contribution) the LRT statistic has a chi-squared distribution with 1
degree of freedom.  One can set up other criteria; for example AIC
penalizes each parameter as 2 units on the deviance scale (negative
twice the log-likelihood).  BIC is a bit more complicated in that the
number of units of penalty per parameter on the deviance scale depends
on the number of observations in the data set.

I would claim that the LRT statistic is always a good way of
evaluating the difference in the quality of fit for two models - it is
how you convert it to a p-value that is not clear when you have small
sample sizes.

The difference between what I would advise and what Ben would advise
regarding the LRT is what the null and alternative models are.  I
would remove the Sex term from both.  He would retain the Sex term in
both.  This will result in slightly different conclusions.

This, by the way, emphasizes the point that a test statistic and its
corresponding p-value is not a property of the BreedSuc term.  It
results from comparing the quality of fits of two models - one with
the term and one without.  When we quote t- or z-statistics, and
p-values, in a coefficients table we are providing a summary of many
different types of tests simultaneously.  Unfortunately the conclusion
that is often drawn from the table is that the p-value is a property
of the term itself, which is wrong.

Perhaps I misunderstood your original posting.  I thought that
SameSite=0 meant that the bird did not return to the same nest site.
That is, site fidelity corresponds to SameSite = 1.

In any case the probabilities constructed as you have done are the
probabilities for SameSite = 1.

Then the 0's and 1's would be reversed for that response variable and
your probabilities would be the complement (i.e. 1-p instead of p) of
those calculated above.

4. if I don't drop the term "Sex":

summary(fm)

Generalized linear mixed model fit by the Laplace approximation
Formula: SameSite ~ BreedSuc1 + Sex + (1 | Bird)
?? AIC?? BIC logLik deviance
?77.38 85.34 -34.69??? 69.38
Random effects:
?Groups Name??????? Variance??? ?Std.Dev.
?Bird?? (Intercept)??? ?0.14080? ??? ?0.37524
Number of obs: 54, groups: Bird, 46

Fixed effects:
??????????? ??? ??? Estimate??? ??? ?Std. Error??? ?z value??? ?Pr(>|z|)
(Intercept)??? ?-0.3294?? ??? ? 0.4890 ??? ??? -0.674???? 0.5006
BreedSuc11?? 1.1988??? ??? ? 0.5957?? ??? ?2.012????? 0.0442 *
Sex??? M?????? ??? ? 0.2215?????????? 0.5877?????????? 0.377????? 0.7062
---
Signif. codes:? 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Correlation of Fixed Effects:
?????????? (Intr) ??? ??? ??? BrdS11
BreedSuc11 ??? -0.536
SexM?????? ??? -0.628? ??? 0.065


site fidelity for a female bird (i.e. SexM is 0) who was unsuccessful in
breeding the previous season (i.e.
BreedSuc1 is 0) is -0.3294, corresponding to a probability of about 42%

plogis(-0.3294)

[1] 0.4183866

The log-odds of site fidelity for a female bird who was successful in
breeding is -0.3294 + 1.1988, corresponding to a probability of about 70%

plogis(-0.3294 + 1.1988)

[1] 0.7046208


5. The z-value: Sorry, I still have some trouble with this value....

In model fm (without "Sex") the z-value of BreedSuc1 corresponds to 1.998.

In model fm1("Sex" included) the z-value corresponds to 2.012.

Nearly the same value in both models. But what can someone conclude from: p<
0.05, z=1.998 ??? Because this is what many people write in their result
section (I was told to do so the same....).

"You should see a LRT test statistic close to, but not exactly the same as,
the square of the z value when you compare the models with and without that
term.
This is the sense in which the z-value is an approximation".

I don't really understand how the z-value can be seen as an approximation?
Am I missing some background knowledge here?

I may have been too terse in my explanations.  As mentioned above, I
would claim that the LRT statistic is a reasonable way to compare the
fits of two models, because it is based upon fitting the model with
and without the term of interest.  In the case of a linear model
without random effects it is not necessary to fit the model without
the term just to discover what the LRT statistic would be.  You can
tell from the model fit with the term what the maximum value for the
likelihood of any sub-model will be.

In the case of a linear mixed model or a generalized linear mixed
model you can't decide on the basis of the one model fit what the
likelihood for the other will be.  You can approximate but you you
don't get an exact value.  When it took a very long time to do a model
fit we just used the approximation.  Now that these fits can be done
much more quickly, it makes sense to fit both with and without.

The nature of the approximation is to take the parameter estimate for
BreedSuc and divide it by its approximate standard error.  We call
this the z-statistic because, when everything is working properly,
this should have a distribution close to a standard normal, which we
often write as Z.  The LRT statistic is the difference in the deviance
of the model without and the model with the term.  To me, that is the
quantity of interest and the fact that it should be approximately the
square of the z-statistic is helpful in making rough decisions but I
still want to calculate the difference in the deviance before making a
final decision.

In the summary of fm1 the z-statistic is 1.998 whereas the LRT
statistic comparing fm2 to fm1 is 4.0991.  The square of the
z-statistic will be close to, but not exactly the same as, the LRT
statistic.

You're welcome.  Thanks for the question.

2011/2/19 Douglas Bates <bates at stat.wisc.edu>

Thank you for your questions and for transferring the discussion to
the R-SIG-Mixed-Models mailing list, as we had discussed. ?I have also
copied the mailing list for a class on mixed-effects models that I am
teaching.

I particularly appreciate your desire to learn about the model instead
of just quoting a p-value. ?I often lament to my classes that
statisticians have been far too successful in propagating the idea of
p-values, to the extent that some researchers believe that is all that
is needed to learn about an analysis.

On Sat, Feb 19, 2011 at 3:05 AM, Julia Sommerfeld
<Julia.Sommerfeld at utas.edu.au> wrote:

Dear Douglas and list members,

Apologies in advance if you might consider my questions as too simple to
be asking the godfather of lme4 for an answer...thus, please feel free
to
ignore my email or to forward it to someone else.

I'm a PhD student (Australia/Germany) working on tropical seabirds. As
many of my PhD-collegues, I'm having some difficulties with the analysis
of my data using lmer (family=binomial). While some say: What do you
care

about all the other values as long as you've got a p-value... I do
believe
that it is essential to understand WHAT I'm doing here and WHAT all
these
numbers/values mean.

I've read the Chapters (lme4 Book Chapters) and publications about the
use
of lmer and searched the forums - but I don't find a satisfying answer.
And I have the feeling that 1. the statistic lecture at my university
was
a joke (sad to say this) 2. that I need a huge statistical/mathematical
background to fully understand GLMMs.


One of the question I would like to answer is:
Does the previous breeding success influences nest site fidelity?

I have binomial data:
SameSite=1 means birds use the same site

SameSite=0 means birds change nest site

BreedSuc1=1 Birds were successful in previous breeding season
BreedSuc1=0 Birds were not successful " ? ? " ? ? ? ?"

Sex= male, female
Bird= Bird ID

This is my model:

fm<-lmer(SameSite~BreedSuc1+Sex+(1|Bird), family="binomial")

where Bird is my random factor (same birds were sampled more than once)

One thing to note is that there are 46 different birds in the 54
observations. ?Most birds will have just one observation so a random
effect for bird may not be necessary.

summary(fm)

Generalized linear mixed model fit by the Laplace approximation

Formula: SameSite ~ BreedSuc1 + Sex + (1 | Bird)
? AIC ? BIC logLik deviance
?77.38 85.34 -34.69 ? ?69.38
Random effects:
?Groups Name ? ? ? ?Variance Std.Dev.
?Bird ? (Intercept) 0.14080 ?0.37524
Number of obs: 54, groups: Bird, 46

Fixed effects:
? ? ? ? ? ?Estimate Std. Error z value Pr(>|z|)
(Intercept) ?-0.3294 ? ? 0.4890 ?-0.674 ? 0.5006
BreedSuc11 ? ?1.1988 ? ? 0.5957 ? 2.012 ? 0.0442 *
SexM ? ? ? ? ?0.2215 ? ? 0.5877 ? 0.377 ? 0.7062

this suggests that sex is not an important factor in the model. ?The
(Intercept) term is close to zero, relative to its standard error, but
we would retain it in the model as explained below.

---
Signif. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Correlation of Fixed Effects:
? ? ? ? ? (Intr) BrdS11
BreedSuc11 -0.536
SexM ? ? ? -0.628 ?0.065


From this summary output I do understand that the Breeding Success has a
significant effect on nest-site fidelity (p<0.05).

Yes, but ... this p-value should be used as a guide only. ?As
described below a p-value must be viewed in context. ?It is not a
property of the Breeding Success factor; it comes from a comparison of
two models and we should bear in mind that these models are before
interpreting this number.

The interpretation of a p-value for a particular coefficient is that
it is an approximation to the p-value we would get from comparing the
model that has been fit to the mode fit without this particular
coefficient. ?In this case the coefficient corresponds to one of the
terms in the model and I would advocate performing a likelihood ratio
test comparing the two models

fm <- glmer(SameSite~BreedSuc1+Sex+(1|Bird), family="binomial")
fm0 <- glmer(SameSite~Sex+(1|Bird), family="binomial") ?# the null
hypothesis model
anova(fm0, fm)

Even though the function is called anova it will, in this case,
perform a likelihood ratio test (LRT). ?It also prints the values of
AIC and BIC if you prefer to compare models according to one of those
criteria but I prefer using the likelihood ratio for nested models.

However, before doing that comparison you should ask yourself whether
you want to compare models that have the, apparently unnecessary term
for Sex in them. ?The way I would approach the model building is first
to reduce the model to

fm1 <- lmer(SameSite~BreedSuc1+(1|Bird), family="binomial")

You could then compare

anova(fm1, fm)

which I presume will give a large p-value for the LRT, so we prefer
the simpler model, fm1. ?After that, I would compare

fm2 <- lmer(SameSite ~ 1 + (1|Bird), family="binomial")
anova(fm2, fm1)

to see if the BreedSuc1 factor is an important predictor in its own right.

Note that we don't drop the implicit "(Intercept)" term, even though
it has a high p-value in the coefficient table. ?The reason is that
the interpretation of the (Intercept) coefficient depends on the
coding of BreedSuc1.

In model fm, the log-odds of site fidelity for a female bird (i.e.
SexM is 0) who was unsuccessful in breeding the previous season (i.e.
BreedSuc1 is 0) is -0.3294, corresponding to a probability of about
42%

plogis(-0.3294)

[1] 0.4183866

The log-odds of site fidelity for a female bird who was successful in
breeding is -0.3294 + 1.1988, corresponding to a probability of about
70%

plogis(-0.3294 + 1.1988)

[1] 0.7046208

If you had reversed the meaning of BreedSuc to BreedFail, where 0
indicates no failure at breeding and 1 indicates failure, then the
coefficient would change sign (i.e. the coefficient for BreedFail
would be -1.1988) and the intercept would change to

-0.3294 + 1.1988

[1] 0.8694

because the reference level would now be a female bird who was
successful in breeding.

Because the interpretation of the intercept depends upon the coding of
other factors, we retain it in the model whenever other terms are
retained.

But what else can I conclude from this model?

Questions:

1.Random effects: What does the Random Effect table - the Variance, Std.
Dev. and Intercept - tells me: Is there a random effect that my model
has
to account for?

First I would remove the apparently unnecessary Sex term then,
ideally, I would check by comparing the fit of the reduced model to
that of a GLM without the random effect for Bird. ?Unfortunately, I
don't think the definition of deviance for a glm fit is compatible
with that for a model fit by glmer. ?This is something we will need to
fix. ?For the time being I would instead examine the "caterpillar
plot" obtained with

dotplot(ranef(fm1, postVar=TRUE))

which represent the 95% prediction intervals for each of the birds.
If these all overlap zero comfortably I would conclude that the random
effect is not needed an fit a glm without a random effect for bird.

Random effects:
?Groups Name ? ? ? ?Variance Std.Dev.
?Bird ? (Intercept) 0.14080 ?0.37524
Number of obs: 54, groups: Bird, 46

That estimated standard deviation is fairly large. We would expect a
range of contributions on the log-odds scale of about +/- 2 sd which,
at this point of the logistic curve corresponds to considerable
variability in predicted probabilities for birds with the same
characteristics.

2. Fixed Effects: Again the Intercept? Not sure if I understand the
meaning of it (sorry, explanation in Chapter I also doesn't help much)

Actually in this model it is a bit different from the models described
in chapter 1. ?I hope the explanation above makes sense. ?Think of it
as the log-odds of site fidelity for a bird in the "reference group"
where reference group means that all the other variables are a the
zero level.

Fixed effects:
? ? ? ? ? ?Estimate Std. Error z value Pr(>|z|)
(Intercept) ?-0.3294 ? ? 0.4890 ?-0.674 ? 0.5006
BreedSuc11 ? ?1.1988 ? ? 0.5957 ? 2.012 ? 0.0442 *

SexM ? ? ? ? ?0.2215 ? ? 0.5877 ? 0.377 ? 0.7062

3. Meaning of the z-value? Why shall I mention it in te result section?

I would regard the z-value as an approximation. ?The quantity of
interest is the likelihood ratio test statistic which has a
chi-squared distribution under the null hypothesis (i.e. the term can
be deleted from the model without getting a significantly worse fit).
It happens that this would be a chi-squared distribution with 1 degree
of freedom, which corresponds to the square of a standard normal
distribution. ?You should see a LRT test statistic close to, but not
exactly the same as, the square of the z value when you compare the
models with and without that term. ?This is the sense in which the
z-value is an approximation. ?To me the LRT statistic is more reliable
because it is based upon actually refitting the model.

4. Estimate and Std. Error of the fixed effects? How can I tell from
these
values WHAT kind of effect (positiv, negativ?) these parameter have on
nest-site fidelity? Do birds that were successful during the previous
breeding success show a higher nest-site fidelity? Remember, I have
binomial data...

That is described above. ?If you want the estimate of the site
fidelity for bird with certain characteristics you evaluate the
corresponding combination of coefficients and apply plogis to the
result.

I would highly appreciate your feedback and/or suggestions of

papers/chapters I could read for a better understanding of the output.

Best regards,


Julia

I hope this helps.

Interpretation of lmer output in R

Thread (3 messages)