Skip to content

Log likelihood of a glmer() binomial model .

2 messages · Dimitris Rizopoulos, Ed Merkle

#
Dear Juho,

The log-likelihood function in GLMMs does not have a closed-form. For example, see slides 345-347 in my notes. I.e., when we combine a Binomial probability mass function (pmf) for the response given the random effects, with the normal probability density function for the random effects, we do not get back a Binomial pmf.

Hence, AFAIK you cannot use dbinom() to calculate the marginal deviance in a mixed-effects logistic regression.

You could calculate the ?deviance? from the first term in slide 345, but this doesn?t correspond to a log-likelihood, to my view (i.e., the log-likelihood is typically calculated based on observed data alone).

Best,
Dimitris


From: Juho Kristian Ruohonen <juho.kristian.ruohonen at gmail.com>
Sent: Tuesday, February 2, 2021 8:13 AM
To: D. Rizopoulos <d.rizopoulos at erasmusmc.nl>
Cc: Ben Bolker <bbolker at gmail.com>; R-mixed models mailing list <r-sig-mixed-models at r-project.org>
Subject: Re: [R-sig-ME] Log likelihood of a glmer() binomial model .

Dear Dimitris/List,

Slide 360 in Dimitris' teaching materials<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.drizopoulos.com%2Fcourses%2FEMC%2FCE08.pdf&data=04%7C01%7Cd.rizopoulos%40erasmusmc.nl%7Cd2d097e45e61476cd4fd08d8c749f2f8%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637478467710678229%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8SrRl9g%2FkefFyyERBIkVL%2F1T3Li7oyCqgRbKdonrarM%3D&reserved=0> appears to detail how to compute marginal LL, yet my attempts to manually reproduce the marginal LL/deviance reported by any GLMM software continue to fail.

The code below uses Dimitris' dataset with Ben's software. The software choice is because with Ben's software I can at least replicate the conditional deviance even though I cannot replicate the marginal deviance, as shown below:


data("pbc2", package = "JM")



pbc2$serCholD <- as.numeric(pbc2$serChol > 210)



lme4mod <- glmer(serCholD ~ (1|id) + year*drug + I(age-50)*sex, family = binomial, data = pbc2, nAGQ = 15)

deviance(lme4mod)

[1] 351.5973

-2*sum(dbinom(lme4mod at resp$y, size = 1, prob = predict(lme4mod, re.form = NULL, type = "response"), log = T))

[1] 351.5973

But I cannot replicate the marginal deviance, which is evidently not the same as the "mean-subject" deviance based on only the fixed effects:


-2*logLik(lme4mod)

'log Lik.' 702.6521 (df=8)

-2*sum(dbinom(lme4mod at resp$y, size = 1, prob = predict(lme4mod, re.form = NA, type = "response"), log = T))

[1] 1233.415

So, let's try calculating the marginal deviance from marginal probabilities created using the procedure detailed by Dimitris' Slide 360:


marginalFitted <- plogis( # On the probability scale,

  sapply(rownames(lme4mod at frame), function(i) # for each row of the dataset,

    # the "marginal" prediction is the average of 10,000 "mean-subject" predictions of which each has been augmented by a quantity randomly generated from the estimated N(0, lme4mod at theta) distribution:

    mean(

      predict(lme4mod, newdata = pbc2[i,], re.form = NA) + rnorm(10000, sd = lme4mod at theta)

      )))

marginalDeviance <- -2*sum(dbinom(lme4mod at resp$y, size = 1, prob = marginalFitted, log = T))

But the resulting quantity spectacularly fails to match the marginal likelihood reported by Ben's software:


-2*logLik(lme4mod)

'log Lik.' 702.6521 (df=8)

marginalDeviance

[1] 1232.663

Hence, I remain at my wits' end.

Best,

J

su 31. tammik. 2021 klo 11.57 D. Rizopoulos (d.rizopoulos at erasmusmc.nl<mailto:d.rizopoulos at erasmusmc.nl>) kirjoitti:
If I may add to this, GLMMadaptive does produce marginal predictions, i.e., predictions based on the marginal coefficients; for example, see below. These predictions will have the same interpretation as the one you would get from a GEE approach. You may find more information on this also in my slide (Chapter 5): http://www.drizopoulos.com/courses/EMC/CE08.pdf<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.drizopoulos.com%2Fcourses%2FEMC%2FCE08.pdf&data=04%7C01%7Cd.rizopoulos%40erasmusmc.nl%7Cd2d097e45e61476cd4fd08d8c749f2f8%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637478467710678229%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8SrRl9g%2FkefFyyERBIkVL%2F1T3Li7oyCqgRbKdonrarM%3D&reserved=0>

Also, a small comment on Ben's code: GLMMadaptive does not use a Monte Carlo simulation for calculating the marginal coefficients - this is in closed-form. The Monte Carlo simulation is used for calculating the standard errors of the marginalized coefficients.

Best,
Dimitris


library("GLMMadaptive")
library("lattice")
data("pbc2", package = "JM")

pbc2$serCholD <- as.numeric(pbc2$serChol > 210)

fm_s52_pbc <- mixed_model(fixed = serCholD ~ year * drug + I(age - 50) * sex,
                          random = ~ 1 | id,
                          family = binomial(), data = pbc2, nAGQ = 15)

summary(fm_s52_pbc)

# the data frame that contains the combination of values to
# create the plot
newDF <- with(pbc2, expand.grid(
    year = seq(0, 12, length.out = 15),
    age = 49,
    drug = levels(drug),
    sex = levels(sex)
))

# log odds for average/median subject
# note: you need to use the effectPlotData() function
# from the GLMMadaptive package (i.e., in case of problems use
# 'GLMMadaptive::effectPlotData')
xyplot(pred + low + upp ~ year | sex * drug,
       data = effectPlotData(fm_s52_pbc, newDF),
       type = "l", lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 2,
       xlab = "Follow-up time (years)", ylab = "Conditional Log Odds")

# marginal probabilities and conditional probabilities corresponding to
# the average/median individual (i.e., the one with random effects value equal to zero)
plot_data_marg <- effectPlotData(fm_s52_pbc, newDF, marginal = TRUE)
plot_data_marg$pred0 <- effectPlotData(fm_s52_pbc, newDF)$pred

key <- simpleKey(c("marginal probabilities", "probabilities average patient"),
                 points = FALSE, lines = TRUE)
key$lines$col <- c("red", "blue")
key$lines$lwd <- c(2, 2)
key$lines$lty <- c(1, 1)
expit <- function (x) exp(x) / (1 + exp(x))
xyplot(expit(pred) + expit(pred0) + expit(low) + expit(upp) ~ year | sex * drug,
       data = plot_data_marg, key = key,
       type = "l", lty = c(1, 1, 2, 2), lwd = 2,
       col = c("red", "blue", "black", "black"),
       xlab = "Follow-up time (years)", ylab = "Probabilities")




-----Original Message-----
From: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org<mailto:r-sig-mixed-models-bounces at r-project.org>> On Behalf Of Juho Kristian Ruohonen
Sent: Sunday, January 31, 2021 9:11 AM
To: Ben Bolker <bbolker at gmail.com<mailto:bbolker at gmail.com>>
Cc: R-mixed models mailing list <r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>>
Subject: Re: [R-sig-ME] Log likelihood of a glmer() binomial model .

Dear list/Ben,

GLMMadaptive doesn't do marginal _predictions_ but it does do
Thanks for clearing that up!
With *newdata *set to what? I assume I'd have to use the original dataset while somehow emulating the estimated RE distribution. However, I can't seem to make anything work. A simple example with a simple dataset:
What I'm trying to learn is how to calculate or, failing that, simulate this "marginal" deviance. My first impulse is to simply hand-calculate a deviance from fitted values with all RE's set to their mean of 0. Given that the assumed RE distribution is normal and hence symmetric, my intuition is that the result should equal the result from integrating over the estimated RE distribution. But that fails to happen:

*-2*sum(dbinom(y, size = 1, prob = predict(glmm, re.form = NA,   type =
So my next avenue is to do as Ben seems to suggest, i.e. use the means of simulated "marginal" responses from the model as fitted values. Indeed,
*simulate.merMod()* with *use.u = FALSE* should be exactly what we want, given that its documentation says the following:

use.u
But alas:

*-2*sum(dbinom(y, 1, prob = rowMeans(simulate(glmm, nsim = 1e4, seed =
The resulting deviance is off no matter what random seed I set. As a sidenote, it seems to equal the deviance obtained by setting all random effects to 0, just as my unreliable intuition suggested. But it's the wrong result nonetheless.

Hence I'm at my wits' end. How does one calculate or simulate the marginal deviance of a binary GLMM?

Best,

J

la 30. tammik. 2021 klo 22.55 Ben Bolker (bbolker at gmail.com<mailto:bbolker at gmail.com>) kirjoitti:
_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&amp;data=04%7C01%7Cd.rizopoulos%40erasmusmc.nl%7C7d927fd8921241ae847108d8c5bfca57%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637476774820847825%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=BXuC6CVsrp3tlibaDkAnYC3BxFz%2Bfv3W85OMQn6Gjng%3D&amp;reserved=0<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&data=04%7C01%7Cd.rizopoulos%40erasmusmc.nl%7Cd2d097e45e61476cd4fd08d8c749f2f8%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637478467710698129%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jfLIQQC79l0lr3qJTpGgX2Ucu8r%2FhWuDHKxlJZseAnU%3D&reserved=0>
#
Juho,

Sorry for the late reply, but we have made some progress here with package merDeriv; see for example

?merDeriv::llcont.glmerMod

We have a related paper that focuses on derivatives, but the same ideas apply to the likelihood/deviance:

https://arxiv.org/abs/2011.10414

You need a way to average over each cluster's posterior distribution of random effects; we use a quadrature method in merDeriv.

Best,
Ed