Interpretation of lmer output in R

On Thu, Feb 24, 2011 at 7:23 AM, Thilo Kellermann
Dear Thierry, Ben, Douglas, Julia and list :-)
sorry for mixing in with another (maybe stupid) question, but Julia's
result of the likelihood-ratio-test (LRT) reveals a confusion I came
across with my own data: The test states that the two models differ
significantly from each other, and in Pinheiro & Bates (2000) I read
that I should (or at least can) prefer the model with the lower AIC or
BIC value.
In Julia's model comparison it happens that these values seem to be
inconsistent, since the AIC "prefers" model fd whereas the BIC "prefers"
model fd1 (see below). As mentioned above I observed similar results
with my own data (using lme) which rendered the LRT inconclusive to
me...?
Not an uncommon situation.  Generally AIC is more liberal that BIC in
allowing for more terms in the model.  The AIC criterion penalizes
each additional parameter by two units on the deviance scale.  That
is, an additional parameter must reduce the deviance by at least two
units to be able to reduce the AIC.  BIC is more conservative in that
it requires each additional parameter to reduce the deviance by log(n)
units where n is the number of observations.

If I have nested models, as these are, I prefer the likelihood ratio
test rather than AIC or BIC comparisons.  I regard these "2 units per
parameter" or "log(n) units per parameter" as somewhat arbitrary.  Of
course the likelihood ratio test on a single parameter gives a p-value
less than 5% when the change in the deviance is greater than
qchisq(0.95, df=1)
[1] 3.841459

so it is rather arbitrary too.  Nonetheless I prefer to put the change
in the deviance on a probability scale where I think I can interpret
it more meaningfully.
Best regards,
Thilo

Data:
Models:
fd1: Foraging ~ 1 + (1 | Bird)
fd: Foraging ~ Depth + (1 | Bird)
? ? ? ?Df ? ?AIC ? ? ?BIC ? ? ? logLik ? ? Chisq Chi Df Pr(>Chisq)
fd1 ? 2 ? ?1909.0 ?1920.5 ?-952.52
fd ? ?3 ? ?1904.8 ?1922.0 ?-949.40 ? 6.2385 ? ? ?1 ? ? 0.0125 *
---
Signif. codes: ?0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

On Thu, 2011-02-24 at 12:56 +0000, ONKELINX, Thierry wrote:
Dear Julia,

Plotting the mean probability of foraging versus depth will make thing more clear.

Depth <- seq(0, 50, length = 101)
plot(Depth, plogis(-1.6898789+0.0007075*Depth), type = "l")

Note that the difference in probability between Depth = 0 and Depth = 50 is quite small and thus not relevant. So you have a variable (depth) in the model which improves the model, but has no biological relevance. Such thing can happen when you have a lot of data.

I would keep depth in the model, and conclude that is it have a neglectable effect. This is a case were rejecting the STATISICAL null-hypothesis allows to accept the BIOLOGICAL null-hypotheses. Which is stronger than not being able to reject the BIOLOGICAL null-hypothesis.

Best regards,

Thierry

----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] Namens
Julia Sommerfeld
Verzonden: donderdag 24 februari 2011 13:34
Aan: r-sig-mixed-models at r-project.org
Onderwerp: [R-sig-ME] Fwd: Interpretation of lmer output in R

---------- Forwarded message ----------
From: Julia Sommerfeld <Julia.Sommerfeld at utas.edu.au>
Date: 2011/2/24
Subject: Re: [R-sig-ME] Interpretation of lmer output in R
To: Douglas Bates <bates at stat.wisc.edu>

... it's iluminating the confusion in my brain every day a
little bit more!

Yes, the confusion comes from my coding 0,1 both parameter. I
hope you don't mind that I keep asking....

1. Maybe it helps if I have a look at a different data set
where only the response variable has a coding of 0,1.

I want to test if foraging (0,1 where 0=means "NO foraging"
and 1=means "foraging" ) is influenced by water depth (Depth)
(in meters), i.e. does my model including "Depth" fit better
than without?

?My model:

fm<-lmer(Foraging~Depth+(1|Bird),family=binomial)
fm1<-lmer(Foraging~1+(1|Bird),family=binomial)
anova(fm,fm1)
Data:
Models:
fd1: Foraging ~ 1 + (1 | Bird)
fd: Foraging ~ Depth + (1 | Bird)
? ? ? ?Df ? ?AIC ? ? ?BIC ? ? ? logLik ? ? Chisq Chi Df Pr(>Chisq)
fd1 ?2 ? ?1909.0 ?1920.5 ?-952.52
fd ? ?3 ? ?1904.8 ?1922.0 ?-949.40 ? 6.2385 ? ? ?1 ? ? 0.0125 *
---
Signif. codes: ?0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(fm)
Generalized linear mixed model fit by the Laplace approximation
Formula: Foraging ~ Depth + (1 | Bird)
? AIC ?BIC logLik deviance
?1905 1922 -949.4 ? ? 1899
Random effects:
?Groups Name ? ? ? ?Variance Std.Dev.
?Bird ? (Intercept) ? ? 0.29505 ?0.54319
Number of obs: 2249, groups: Bird, 23

Fixed effects:
? ? ? ? ? ? ? ? ?Estimate ? ? ?Std. Error ? ? z value ?Pr(>|z|)
(Intercept) -1.6898789 ?0.1463908 ?-11.544 ? <2e-16 ***
Depth ? ? ? ?0.0007075 ? 0.0002884 ? 2.453 ? ? 0.0142 *
---
Signif. codes: ?0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
? ? ? (Intr)
Depth 0.244

From this ouput I conclude that the model including "Depth" (fm)
explains
the observed data significanlty better than without Depth (fm1).

2. But, if I want to know HOW Depth is influencing/related to
the foraging event (1). I.e., do birds preferably forage in
shallow or deep waters? How can I get this information out of
the summary? Would plogis() still be the right approach in this case?

plogis(1.6898789)

plogis(1.6898789+0.0007075*second_column?)

above plogis based on foraging=1, but what about "Depth"?

I think I'm still getting someting wrong here...

Cheers,

Julia

2011/2/23 Douglas Bates <bates at stat.wisc.edu>

On Wed, Feb 23, 2011 at 6:17 AM, Julia Sommerfeld
<Julia.Sommerfeld at utas.edu.au> wrote:
Dear Douglas and list,

Again, thanks for the thorough answers.

Regarding my last email: I'm mixing up my own data! Sorry
for that.

1. plogis(x)

SameSite=1 means that birds did use the SAME nest site,
hence: they
show site-fidelity.

SameSite=0 means that birds did CHANGE nest site, hence: no
site-fidelity.

This means that site-fidelity of "bird" who failed breeding
corresponds
to a
probability of 44% and that bird who was successful
corresponds to a
probability of 72% (see below).
Yes. ?When you code the response as 0 and 1 then the model provides
the probability that the response will be a 1.

Thus, with "plogis" of the Intercept and/or Intercept+Fixed Effect
Estimate
I can calculate the probability of my observed event
(R-help: plogis
gives
the distribution function; being the inverse logit-function)?
Yes. ?It happens that the logit function (see
http://en.wikipedia.org/wiki/Logit) is the quantile
function for the
standard logistic distribution (the R function qlogis) and
its inverse
is the cumulative probability function for the standard
logistic (the
R function plogis). ?Because I know that the qlogis and plogis
functions are carefully written to handle difficult cases
gracefully I
use them instead of writing out the expressions explicitly.

That is: plogis of the intercept estimate is based upon
BreedSuc1=0,
SameSite=1 and plogis of the intercept+BreedSuc1 Estimate
is based
upon BreedSuc1=1, SameSite=1?
Yes. ?Another way of looking at it would be to extract the model
matrix for your fitted model, using the function model.matrix. ?You
will see that the first column is all 1's and the second
column will
be 0's and 1's according to that bird's breeding success.
Technically
the fixed-effects part of the predicted log-odds for a
particular bird
is (Intercept) * first_column + BreedSuc1 * second_column
but, because
every row is either 1 0 or 1 1, ?the only possible values of the
predicted log-odds are (Intercept) and (Intercept) + BreedSuc1

And the "result" will always depend upon my coding (BreedSuc1=1
means successful, SameSite=1 means site-fidelity). This
is still a
bit confusing...
I would consider the two factors separately. ?Regarding the
SameSite
encoding just think of it as you are eventually modeling the
probability of an "event" for which there are two possible outcomes.
You can call them Failure/Success or Off/On or No/Yes or
whatever you
want but when you get to a numerical coding as 0/1 the
model will be
for the probability of getting a 1.

As for the breeding success factor, you again need to
characterize it
according to some numerical encoding to be able to create the model
matrix. ?You happened to choose a 0/1 encoding so in the model the
coefficient for that term is added to the intercept when
there is a 1
for that factor and not added when there is a 0.

My question was:
What if both codings change? I.e. BreedSuc1=0 means
successful and
SameSite=0 means site-fidelity? Would it then still be
1-p instead of p?
If change the encoding of the response, then your coefficients will
all flip signs. ?What was previously a log-odds of site
fidelity of,
say, 1.2 will now become a log-odds of site switching of -1.2. ?You
can check that

plogis(-x) = 1 - plogis(x)

If you change the encoding of BreedSuc then the (Intercept)
coefficient will be the log-odds of a '1' outcome (whether
that means
site fidelity or site switching) for a bird who was successful and
(Intercept) + BreedFail1 will be the log-odds of a 1 outcome for a
bird who was successful.

I think part of the confusion for you is that you are trying to
combine the encoding of site-fidelity and breeding success. ?It is
best to keep them distinct because site-fidelity is the
response and
breeding success is a covariate.

2. Z-value:

I assume that z-value is the same as z-score? I've found the
following
link
(

http://resources.esri.com/help/9.3/arcgisdesktop/com/gp_toolref/spatia
l_statistics_toolbox/what_is_a_z_score_what_is_a_p_value.htm
).

There, the z-score is defined as:

"The Z score is a test of statistical significance that helps you
decide whether or not to reject the null hypothesis".

"Z scores are measures of standard deviation. For
example, if a tool
returns
a Z score of +2.5 it is interpreted as "+2.5 standard deviations
away
from
the mean". P-values are probabilities. Both statistics are
associated
with
the standard normal distribution. This distribution
relates standard
deviations with probabilities and allows significance and
confidence
to
be
attached to Z scores and p-values".

Is this correct?
Well, not really. ?The logic behind statistical hypothesis
testing is
subtle and frequently misunderstood. ?When you try to simplify the
explanation, as is done above, it often ends up not quite accurate.

The way I present it to students is that we have two
competing claims,
which for us will boil down to a simpler model being
adequate (called
the "null hypothesis" as in "nothing going on here, folks") or the
simpler model is not adequate and we need to extend it. ?The second
claim is called the alternative hypothesis. ?In your case the null
hypothesis is that breeding success doesn't influence the
probability
of site fidelity and the alternative hypothesis is that it
does. ?You
want to establish the alternative hypothesis but, because of the
variability in the data, you can't "prove" it directly.
You have to
use an indirect method, which is to assume that it is not the case
(i.e. the null hypothesis is true) and demonstrate that it
is unlikely
to obtain the data that you did, or something even more unusual, if
the null hypothesis were true.

So that is the subtlety. ?You can't prove your point
directly so you
set up the "straw man" argument (the null hypothesis) and try to
refute it. ?But nothing is certain. ?You can't prove that it is
impossible to get the data that you did if breeding success did not
influence site fidelity. ?The best you can do is say it is
extremely
unlikely.

Now we need to formalize this argument by settling on a "test
statistic", which is a quantity that can be calculated from the
observed data. ?Along with the test statistic we need a reference
distribution which is the distribution of the test
statistic when the
null hypothesis is true. ?Formally, we are supposed to set
up all the
rules about the test statistic and the reference
distribution before
we see the data - sort of like the "pistols at dawn, march
ten paces
then turn and fire" rules of a duel. ?In practice we
sometimes look at
the data before we decide exactly what hypotheses we will
test. ?This
is the "data snooping" that Ben wrote about.

Anyway, when we have set up the rules and we have the data
we evaluate
the test statistic and then determine the probability of
seeing that
value or something even more extreme when the null
hypothesis is true.
?This is what we call the p-value. ?If that probability is
very small
then either we encountered a rare case or the null hypothesis is
false. ?That's why you, as the researcher, want the p-value to be
small, when you are trying to establish the alternative.

(As an aside, this is where most people lose track of the argument.
They think the p-value is related to the probability of one of the
hypotheses being true. ?That is not the case. ?The p-value is the
probability of seeing the data that we did, or something
more unusual,
if the null hypothesis - the "no change" assumption - were true.)

Now we get to the point of how do we calculate the p-value.
?We need
to come up with a way of characterizing "unusual" data.
What I prefer
to do is to see how good the fit is without allowing for
differences
due to breeding success and how good it is when allowing for
differences due to breeding success. ?The measure of "how
good is the
fit" is called the deviance. ?So we fit the model with the
BreedSuc1
term and without it and compare the deviance values. ?The reference
distribution for this difference is generally taken to be a
chi-squared distribution. ?Technically all we can say is
that when the
sample sizes get very large, the difference in the deviance values
tends to a chi-squared distribution. ?But I would claim
that it is a
reasonable way of judging the difference in quality of fit
for finite
samples too.

An alternative approach to judging if a coefficient is
"significantly
different" from zero is to take the value of the coefficient and
divide by its standard error. ? If everything is behaving well, this
ratio would have a standard Gaussian (or "normal")
distribution when
the null hypothesis is true. ?We write a standard normal random
variable as Z so this ratio is called a z-statistic.

For me the problem with using a z-statistic is that you are
comparing
two models (with and without the term of interest) by
fitting only one
of them and then extrapolating to decide what the other fit
should be
like. ?This was a necessary short-cut when fitting a model might
involve weeks of hand calculation. ?But if fitting a model involves
only a few seconds of computer time, why not fit both and do a
comparison of the actual difference in the quality of the fits?

So, why should we even both printing the z-statistics? ?I consider
them to be a guide but if I actually want to do the hypothesis test
involving a simple model versus a more complex model I fit both
models.

I hope this is more illuminating than confusing.

--
Julia Sommerfeld - PhD Candidate
Institute for Marine and Antarctic Studies University of
Tasmania Private Bag 129, Hobart TAS 7001

Phone: +61 458 247 348
Email: julia.somma at gmx.de
Julia.Sommerfeld at utas.edu.au

? ? [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Thilo Kellermann
RWTH Aachen University
Department of Psychiatry, Psychotherapy and Psychosomatics
JARA Translational Brain Medicine
Pauwelsstr. 30
52074 Aachen
Germany
Tel.: +49 (0)241 / 8089977
Fax.: +49 (0)241 / 8082401
E-Mail: thilo.kellermann at rwth-aachen.de

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Interpretation of lmer output in R

Thread (4 messages)