Likelihood ratio test between glm and glmer fits

2008/7/16 Dimitris Rizopoulos <Dimitris.Rizopoulos at med.kuleuven.be>:
well, for computing the p-value you need to use pchisq() and dchisq() (check
?dchisq for more info). For model fits with a logLik method you can directly
use the following simple function:

lrt <- function (obj1, obj2) {
   L0 <- logLik(obj1)
   L1 <- logLik(obj2)
   L01 <- as.vector(- 2 * (L0 - L1))
   df <- attr(L1, "df") - attr(L0, "df")
   list(L01 = L01, df = df,
       "p-value" = pchisq(L01, df, lower.tail = FALSE))
}

library(lme4)
gm0 <- glm(cbind(incidence, size - incidence) ~ period,
             family = binomial, data = cbpp)
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
             family = binomial, data = cbpp)

lrt(gm0, gm1)
Yes, that seems quite natural, but then try to compare with the deviance:

logLik(gm0)
logLik(gm1)

(d0 <- deviance(gm0))
(d1 <- deviance(gm1))
(LR <- d0 - d1)
pchisq(LR, 1, lower = FALSE)

Obviously the deviance in glm is *not* twice the negative
log-likelihood as it is in glmer. The question remains which of these
two quantities is appropriate for comparison. I am not sure exactly
how the deviance and/or log-likelihood are calculated in glmer, but my
feeling is that one should trust the deviance rather than the
log-likelihoods for these purposes. This is supported by the following
comparison: Ad an arbitrary random effect with a close-to-zero
variance and note the deviance:

tmp <- rep(1:4, each = nrow(cbpp)/4)
gm2 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | tmp),
            family = binomial, data = cbpp)
(d2 <- deviance(gm2))

This deviance is very close to that obtained from the glm model.

I have included the mixed-models mailing list in the hope that someone
could explain how the deviance is computed in glmer and why deviances,
but not likelihoods are comparable to glm-fits.
In that example I think the problem may be that I have not yet written
the code to adjust the deviance of the glmer fit for the null
deviance.
However, there are some issues regarding this likelihood ratio test.

1) The null hypothesis is on the boundary of the parameter space, i.e., you
test whether the variance for the random effect is zero. For this case the
assumed chi-squared distribution for the LRT may *not* be totally
appropriate and may produce conservative p-values. There is some theory
regarding this issue, which has shown that the reference distribution for
the LRT in this case is a mixture of a chi-squared(df = 0) and
chi-squared(df = 1). Another option is to use simulation-based approach
where you can approximate the reference distribution of the LRT under the
null using simulation. You may check below for an illustration of this
procedure (not-tested):

X <- model.matrix(gm0)
coefs <- coef(gm0)
pr <- plogis(c(X %*% coefs))
n <- length(pr)
new.dat <- cbpp
Tobs <- lrt(gm0, gm1)$L01
B <- 200
out.T <- numeric(B)
for (b in 1:B) {
   y <- rbinom(n, cbpp$size, pr)
   new.dat$incidence <- y
   fit0 <- glm(formula(gm0), family = binomial, data = new.dat)
   fit1 <- glmer(formula(gm1), family = binomial, data = new.dat)
   out.T[b] <- lrt(fit0, fit1)$L01
}
# estimate p-value
(sum(out.T >= Tobs) + 1) / (B + 1)

2) For the glmer fit you have to note that you work with an approximation to
the log-likelihood (obtained using numerical integration) and not the actual
log-likelihood.

I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
    http://www.student.kuleuven.be/~m0390867/dimitris.htm

Quoting COREY SPARKS <corey.sparks at UTSA.EDU>:

Dear list,
I am fitting a logistic multi-level regression model and need to  test the
difference between the ordinary logistic regression from a  glm() fit and
the mixed effects fit from glmer(), basically I want  to do a likelihood
ratio test between the two fits.

The data are like this:
My outcome is a (1,0) for health status, I have several (1,0) dummy
 variables RURAL, SMOKE, DRINK, EMPLOYED, highereduc, INDIG, male,
 divorced, SINGLE, chronic, vigor_d and moderat_d and AGE is  continuous (20
to 100).
My higher level is called munid and has 581 levels.
The data have 45243 observations.

Here are my program statements:

#GLM fit

ph.fit.2<-glm(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d,family=binomial(),
 data=mx.merge)
#GLMER fit

ph.fit.3<-glmer(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+INSURANCE+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d+(1|munid),family=binomial(),
 data=mx.merge)

I cannot find a method in R that will do the LR test between a glm  and a
glmer fit, so I try to do it using the liklihoods from both  models

#form the likelihood ratio test between the glm and glmer fits
x2<--2*(logLik(ph.fit.2)-logLik(ph.fit.3))

   ML
79.60454
attr(,"nobs")
   n
45243
attr(,"nall")
   n
45243
attr(,"df")
[1] 14
attr(,"REML")
[1] FALSE
attr(,"class")
[1] "logLik"

#Get the associated p-value
dchisq(x2,14)
        ML
5.94849e-15
Which looks like an improvement in model fit to me.  Am I seeing  this
correctly or are the two models even able to be compared? they  are both
estimated via maximum likelihood, so they should be, I think.
Any help would be appreciated.

Corey

Corey S. Sparks, Ph.D.

Assistant Professor
Department of Demography and Organization Studies
University of Texas San Antonio
One UTSA Circle
San Antonio, TX 78249
email:corey.sparks at utsa.edu
web: https://rowdyspace.utsa.edu/users/ozd504/www/index.htm

       [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Rune Haubo Bojesen Christensen

Master Student, M.Sc. Eng.
Phone: (+45) 30 26 45 54
Mail: rhbc at imm.dtu.dk, rune.haubo at gmail.com

DTU Informatics, Section for Statistics
Technical University of Denmark, Build.321, DK-2800 Kgs. Lyngby, Denmark

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Likelihood ratio test between glm and glmer fits

Thread (3 messages)