Interpretation of lmer output in R
On Thu, Feb 24, 2011 at 7:23 AM, Thilo Kellermann
<thilo.kellermann at rwth-aachen.de> wrote:
Dear Thierry, Ben, Douglas, Julia and list :-)
sorry for mixing in with another (maybe stupid) question, but Julia's result of the likelihood-ratio-test (LRT) reveals a confusion I came across with my own data: The test states that the two models differ significantly from each other, and in Pinheiro & Bates (2000) I read that I should (or at least can) prefer the model with the lower AIC or BIC value.
In Julia's model comparison it happens that these values seem to be inconsistent, since the AIC "prefers" model fd whereas the BIC "prefers" model fd1 (see below). As mentioned above I observed similar results with my own data (using lme) which rendered the LRT inconclusive to me...?
Not an uncommon situation. Generally AIC is more liberal that BIC in allowing for more terms in the model. The AIC criterion penalizes each additional parameter by two units on the deviance scale. That is, an additional parameter must reduce the deviance by at least two units to be able to reduce the AIC. BIC is more conservative in that it requires each additional parameter to reduce the deviance by log(n) units where n is the number of observations. If I have nested models, as these are, I prefer the likelihood ratio test rather than AIC or BIC comparisons. I regard these "2 units per parameter" or "log(n) units per parameter" as somewhat arbitrary. Of course the likelihood ratio test on a single parameter gives a p-value less than 5% when the change in the deviance is greater than
qchisq(0.95, df=1)
[1] 3.841459 so it is rather arbitrary too. Nonetheless I prefer to put the change in the deviance on a probability scale where I think I can interpret it more meaningfully.
Best regards, Thilo
Data: Models: fd1: Foraging ~ 1 + (1 | Bird) fd: Foraging ~ Depth + (1 | Bird) ? ? ? ?Df ? ?AIC ? ? ?BIC ? ? ? logLik ? ? Chisq Chi Df Pr(>Chisq) fd1 ? 2 ? ?1909.0 ?1920.5 ?-952.52 fd ? ?3 ? ?1904.8 ?1922.0 ?-949.40 ? 6.2385 ? ? ?1 ? ? 0.0125 * --- Signif. codes: ?0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
On Thu, 2011-02-24 at 12:56 +0000, ONKELINX, Thierry wrote:
Dear Julia, Plotting the mean probability of foraging versus depth will make thing more clear. Depth <- seq(0, 50, length = 101) plot(Depth, plogis(-1.6898789+0.0007075*Depth), type = "l") Note that the difference in probability between Depth = 0 and Depth = 50 is quite small and thus not relevant. So you have a variable (depth) in the model which improves the model, but has no biological relevance. Such thing can happen when you have a lot of data. I would keep depth in the model, and conclude that is it have a neglectable effect. This is a case were rejecting the STATISICAL null-hypothesis allows to accept the BIOLOGICAL null-hypotheses. Which is stronger than not being able to reject the BIOLOGICAL null-hypothesis. Best regards, Thierry ---------------------------------------------------------------------------- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
-----Oorspronkelijk bericht----- Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Julia Sommerfeld Verzonden: donderdag 24 februari 2011 13:34 Aan: r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] Fwd: Interpretation of lmer output in R ---------- Forwarded message ---------- From: Julia Sommerfeld <Julia.Sommerfeld at utas.edu.au> Date: 2011/2/24 Subject: Re: [R-sig-ME] Interpretation of lmer output in R To: Douglas Bates <bates at stat.wisc.edu> ... it's iluminating the confusion in my brain every day a little bit more! Yes, the confusion comes from my coding 0,1 both parameter. I hope you don't mind that I keep asking.... 1. Maybe it helps if I have a look at a different data set where only the response variable has a coding of 0,1. I want to test if foraging (0,1 where 0=means "NO foraging" and 1=means "foraging" ) is influenced by water depth (Depth) (in meters), i.e. does my model including "Depth" fit better than without? ?My model:
fm<-lmer(Foraging~Depth+(1|Bird),family=binomial) fm1<-lmer(Foraging~1+(1|Bird),family=binomial) anova(fm,fm1)
Data: Models: fd1: Foraging ~ 1 + (1 | Bird) fd: Foraging ~ Depth + (1 | Bird) ? ? ? ?Df ? ?AIC ? ? ?BIC ? ? ? logLik ? ? Chisq Chi Df Pr(>Chisq) fd1 ?2 ? ?1909.0 ?1920.5 ?-952.52 fd ? ?3 ? ?1904.8 ?1922.0 ?-949.40 ? 6.2385 ? ? ?1 ? ? 0.0125 * --- Signif. codes: ?0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fm)
Generalized linear mixed model fit by the Laplace approximation Formula: Foraging ~ Depth + (1 | Bird) ? AIC ?BIC logLik deviance ?1905 1922 -949.4 ? ? 1899 Random effects: ?Groups Name ? ? ? ?Variance Std.Dev. ?Bird ? (Intercept) ? ? 0.29505 ?0.54319 Number of obs: 2249, groups: Bird, 23 Fixed effects: ? ? ? ? ? ? ? ? ?Estimate ? ? ?Std. Error ? ? z value ?Pr(>|z|) (Intercept) -1.6898789 ?0.1463908 ?-11.544 ? <2e-16 *** Depth ? ? ? ?0.0007075 ? 0.0002884 ? 2.453 ? ? 0.0142 * --- Signif. codes: ?0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Correlation of Fixed Effects: ? ? ? (Intr) Depth 0.244
From this ouput I conclude that the model including "Depth" (fm) explains
the observed data significanlty better than without Depth (fm1). 2. But, if I want to know HOW Depth is influencing/related to the foraging event (1). I.e., do birds preferably forage in shallow or deep waters? How can I get this information out of the summary? Would plogis() still be the right approach in this case? plogis(1.6898789) plogis(1.6898789+0.0007075*second_column?) above plogis based on foraging=1, but what about "Depth"? I think I'm still getting someting wrong here... Cheers, Julia 2011/2/23 Douglas Bates <bates at stat.wisc.edu>
On Wed, Feb 23, 2011 at 6:17 AM, Julia Sommerfeld <Julia.Sommerfeld at utas.edu.au> wrote:
Dear Douglas and list,
Again, thanks for the thorough answers.
Regarding my last email: I'm mixing up my own data! Sorry
for that.
1. plogis(x)
SameSite=1 means that birds did use the SAME nest site,
hence: they
show site-fidelity.
SameSite=0 means that birds did CHANGE nest site, hence: no
site-fidelity.
This means that site-fidelity of "bird" who failed breeding corresponds
to a
probability of 44% and that bird who was successful
corresponds to a
probability of 72% (see below).
Yes. ?When you code the response as 0 and 1 then the model provides the probability that the response will be a 1.
Thus, with "plogis" of the Intercept and/or Intercept+Fixed Effect
Estimate
I can calculate the probability of my observed event
(R-help: plogis
gives
the distribution function; being the inverse logit-function)?
Yes. ?It happens that the logit function (see http://en.wikipedia.org/wiki/Logit) is the quantile
function for the
standard logistic distribution (the R function qlogis) and
its inverse
is the cumulative probability function for the standard
logistic (the
R function plogis). ?Because I know that the qlogis and plogis functions are carefully written to handle difficult cases
gracefully I
use them instead of writing out the expressions explicitly.
That is: plogis of the intercept estimate is based upon
BreedSuc1=0,
SameSite=1 and plogis of the intercept+BreedSuc1 Estimate
is based
upon BreedSuc1=1, SameSite=1?
Yes. ?Another way of looking at it would be to extract the model matrix for your fitted model, using the function model.matrix. ?You will see that the first column is all 1's and the second
column will
be 0's and 1's according to that bird's breeding success.
Technically
the fixed-effects part of the predicted log-odds for a
particular bird
is (Intercept) * first_column + BreedSuc1 * second_column
but, because
every row is either 1 0 or 1 1, ?the only possible values of the predicted log-odds are (Intercept) and (Intercept) + BreedSuc1
And the "result" will always depend upon my coding (BreedSuc1=1 means successful, SameSite=1 means site-fidelity). This
is still a
bit confusing...
I would consider the two factors separately. ?Regarding the
SameSite
encoding just think of it as you are eventually modeling the probability of an "event" for which there are two possible outcomes. You can call them Failure/Success or Off/On or No/Yes or
whatever you
want but when you get to a numerical coding as 0/1 the
model will be
for the probability of getting a 1. As for the breeding success factor, you again need to
characterize it
according to some numerical encoding to be able to create the model matrix. ?You happened to choose a 0/1 encoding so in the model the coefficient for that term is added to the intercept when
there is a 1
for that factor and not added when there is a 0.
My question was: What if both codings change? I.e. BreedSuc1=0 means
successful and
SameSite=0 means site-fidelity? Would it then still be
1-p instead of p?
If change the encoding of the response, then your coefficients will all flip signs. ?What was previously a log-odds of site
fidelity of,
say, 1.2 will now become a log-odds of site switching of -1.2. ?You can check that plogis(-x) = 1 - plogis(x) If you change the encoding of BreedSuc then the (Intercept) coefficient will be the log-odds of a '1' outcome (whether
that means
site fidelity or site switching) for a bird who was successful and (Intercept) + BreedFail1 will be the log-odds of a 1 outcome for a bird who was successful. I think part of the confusion for you is that you are trying to combine the encoding of site-fidelity and breeding success. ?It is best to keep them distinct because site-fidelity is the
response and
breeding success is a covariate.
2. Z-value: I assume that z-value is the same as z-score? I've found the following
link
(
l_statistics_toolbox/what_is_a_z_score_what_is_a_p_value.htm ).
There, the z-score is defined as: "The Z score is a test of statistical significance that helps you decide whether or not to reject the null hypothesis". "Z scores are measures of standard deviation. For
example, if a tool
returns
a Z score of +2.5 it is interpreted as "+2.5 standard deviations away
from
the mean". P-values are probabilities. Both statistics are associated
with
the standard normal distribution. This distribution
relates standard
deviations with probabilities and allows significance and
confidence
to
be
attached to Z scores and p-values".
Is this correct?
Well, not really. ?The logic behind statistical hypothesis
testing is
subtle and frequently misunderstood. ?When you try to simplify the explanation, as is done above, it often ends up not quite accurate. The way I present it to students is that we have two
competing claims,
which for us will boil down to a simpler model being
adequate (called
the "null hypothesis" as in "nothing going on here, folks") or the simpler model is not adequate and we need to extend it. ?The second claim is called the alternative hypothesis. ?In your case the null hypothesis is that breeding success doesn't influence the
probability
of site fidelity and the alternative hypothesis is that it
does. ?You
want to establish the alternative hypothesis but, because of the variability in the data, you can't "prove" it directly.
You have to
use an indirect method, which is to assume that it is not the case (i.e. the null hypothesis is true) and demonstrate that it
is unlikely
to obtain the data that you did, or something even more unusual, if the null hypothesis were true. So that is the subtlety. ?You can't prove your point
directly so you
set up the "straw man" argument (the null hypothesis) and try to refute it. ?But nothing is certain. ?You can't prove that it is impossible to get the data that you did if breeding success did not influence site fidelity. ?The best you can do is say it is
extremely
unlikely. Now we need to formalize this argument by settling on a "test statistic", which is a quantity that can be calculated from the observed data. ?Along with the test statistic we need a reference distribution which is the distribution of the test
statistic when the
null hypothesis is true. ?Formally, we are supposed to set
up all the
rules about the test statistic and the reference
distribution before
we see the data - sort of like the "pistols at dawn, march
ten paces
then turn and fire" rules of a duel. ?In practice we
sometimes look at
the data before we decide exactly what hypotheses we will
test. ?This
is the "data snooping" that Ben wrote about. Anyway, when we have set up the rules and we have the data
we evaluate
the test statistic and then determine the probability of
seeing that
value or something even more extreme when the null
hypothesis is true.
?This is what we call the p-value. ?If that probability is
very small
then either we encountered a rare case or the null hypothesis is false. ?That's why you, as the researcher, want the p-value to be small, when you are trying to establish the alternative. (As an aside, this is where most people lose track of the argument. They think the p-value is related to the probability of one of the hypotheses being true. ?That is not the case. ?The p-value is the probability of seeing the data that we did, or something
more unusual,
if the null hypothesis - the "no change" assumption - were true.) Now we get to the point of how do we calculate the p-value.
?We need
to come up with a way of characterizing "unusual" data.
What I prefer
to do is to see how good the fit is without allowing for
differences
due to breeding success and how good it is when allowing for differences due to breeding success. ?The measure of "how
good is the
fit" is called the deviance. ?So we fit the model with the
BreedSuc1
term and without it and compare the deviance values. ?The reference distribution for this difference is generally taken to be a chi-squared distribution. ?Technically all we can say is
that when the
sample sizes get very large, the difference in the deviance values tends to a chi-squared distribution. ?But I would claim
that it is a
reasonable way of judging the difference in quality of fit
for finite
samples too. An alternative approach to judging if a coefficient is
"significantly
different" from zero is to take the value of the coefficient and divide by its standard error. ? If everything is behaving well, this ratio would have a standard Gaussian (or "normal")
distribution when
the null hypothesis is true. ?We write a standard normal random variable as Z so this ratio is called a z-statistic. For me the problem with using a z-statistic is that you are
comparing
two models (with and without the term of interest) by
fitting only one
of them and then extrapolating to decide what the other fit
should be
like. ?This was a necessary short-cut when fitting a model might involve weeks of hand calculation. ?But if fitting a model involves only a few seconds of computer time, why not fit both and do a comparison of the actual difference in the quality of the fits? So, why should we even both printing the z-statistics? ?I consider them to be a guide but if I actually want to do the hypothesis test involving a simple model versus a more complex model I fit both models. I hope this is more illuminating than confusing.
-- Julia Sommerfeld - PhD Candidate Institute for Marine and Antarctic Studies University of Tasmania Private Bag 129, Hobart TAS 7001 Phone: +61 458 247 348 Email: julia.somma at gmx.de Julia.Sommerfeld at utas.edu.au ? ? [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Thilo Kellermann RWTH Aachen University Department of Psychiatry, Psychotherapy and Psychosomatics JARA Translational Brain Medicine Pauwelsstr. 30 52074 Aachen Germany Tel.: +49 (0)241 / 8089977 Fax.: +49 (0)241 / 8082401 E-Mail: thilo.kellermann at rwth-aachen.de
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models