GLMM estimation readings?

Hi Paul,

I recently added fitting of GLMMs to the MixedModels package for Julia (
https://github.com/dmbates/MixedModels.jl).  It is by no means finished but
it does fit a couple of examples.

I mention this because the Julia code is, I feel, easier to read than is
the C++ code that implements GLMM fitting in the lme4 package, although
being easier to read than that C++ code is a low bar.

If you click through to src/PIRLS.jl you will see that the Laplace
approximation to the deviance is defined as
- the sum of the (squared) deviance residuals at the current values of beta
and u
- plus the squared length of u, where b = Lambda * u
- plus the logarithm of the determinant of Lambda'Z'Z * Lambda + I

The model provides the conditional distribution of Y given B = b (or,
equivalently, U = u) and the unconditional distribution of B (or U).  From
those we get the joint density of Y and B (or Y and U). From the joint
density we obtain the conditional density of U given Y = y up to a scaling
factor.  I call this the unscaled conditional density.  The likelihood for
beta and theta (the covariance parameter vector that determines Lambda) is
the integral of this unscaled conditional density.

Because the unconditional density of U is multivariate normal, the unscaled
conditional density ends up being pretty close to a scaled multivariate
normal.  We approximate the unscaled conditional density by a scaled
multivariate normal that matches the value at the peak (i.e. the
conditional modes of the random effects) and the Hessian at that point.
This is the Laplace approximation. Adaptive Gauss-Hermite quadrature
refines this integral by evaluating the unscaled conditional density at
other points near the peak.

The conditional modes are determined via penalized iteratively reweighted
least squares (PIRLS).  See the function pirls! in the PIRLS.jl file. (The
function name ending in ! is an indication that it is a mutating function.
That is, it modifies the values of one or more of its arguments.)

Linear mixed models (LMMs) are simpler to fit because there is a
closed-form expression for the conditional modes of the random effects and
the conditional estimates of the fixed effects given theta.  Also, the
conditional distribution of U given Y = y is a multivariate normal so the
Laplace "approximation" is actually the exact value of the log-likelihood.
See section 3 of  https://www.jstatsoft.org/article/view/v067i01

I'm not sure if this explanation is illuminating or confusing.  To recap,
evaluating the deviance of a GLMM at given values of beta and theta requires
- evaluating Lambda from theta
- setting X*beta at the offset in the model
- determine the conditional modes of the random effects, U, via PIRLS
- evaluate an approximation to the integral of the conditional density

I agree that the penalty is an important justification for random effects
versus fixed effects.  John Tukey put a positive spin on the situation by
saying that we are "borrowing strength" from the other individuals in the
sample when we use random effects.

Consider fitting an item response model to a dichotomous response (say
correct/incorrect) where each response is associated with a subject and an
item (e.g. a question on an exam).  Often such models are fit using
fixed-effects for the subject and random effects for the item, in which
case there is no finite estimate of the fixed-effect for a subject who gets
all the items correct.  It we model both subjects and items with random
effects, the penalty or shrinkage brings the extremes of the estimates for
the subjects in closer to the population mean.

I hope this helps.

GLMM estimation readings?

Thread (5 messages)