GLMM estimation readings?
Hi Paul, I recently added fitting of GLMMs to the MixedModels package for Julia ( https://github.com/dmbates/MixedModels.jl). It is by no means finished but it does fit a couple of examples. I mention this because the Julia code is, I feel, easier to read than is the C++ code that implements GLMM fitting in the lme4 package, although being easier to read than that C++ code is a low bar. If you click through to src/PIRLS.jl you will see that the Laplace approximation to the deviance is defined as - the sum of the (squared) deviance residuals at the current values of beta and u - plus the squared length of u, where b = Lambda * u - plus the logarithm of the determinant of Lambda'Z'Z * Lambda + I The model provides the conditional distribution of Y given B = b (or, equivalently, U = u) and the unconditional distribution of B (or U). From those we get the joint density of Y and B (or Y and U). From the joint density we obtain the conditional density of U given Y = y up to a scaling factor. I call this the unscaled conditional density. The likelihood for beta and theta (the covariance parameter vector that determines Lambda) is the integral of this unscaled conditional density. Because the unconditional density of U is multivariate normal, the unscaled conditional density ends up being pretty close to a scaled multivariate normal. We approximate the unscaled conditional density by a scaled multivariate normal that matches the value at the peak (i.e. the conditional modes of the random effects) and the Hessian at that point. This is the Laplace approximation. Adaptive Gauss-Hermite quadrature refines this integral by evaluating the unscaled conditional density at other points near the peak. The conditional modes are determined via penalized iteratively reweighted least squares (PIRLS). See the function pirls! in the PIRLS.jl file. (The function name ending in ! is an indication that it is a mutating function. That is, it modifies the values of one or more of its arguments.) Linear mixed models (LMMs) are simpler to fit because there is a closed-form expression for the conditional modes of the random effects and the conditional estimates of the fixed effects given theta. Also, the conditional distribution of U given Y = y is a multivariate normal so the Laplace "approximation" is actually the exact value of the log-likelihood. See section 3 of https://www.jstatsoft.org/article/view/v067i01 I'm not sure if this explanation is illuminating or confusing. To recap, evaluating the deviance of a GLMM at given values of beta and theta requires - evaluating Lambda from theta - setting X*beta at the offset in the model - determine the conditional modes of the random effects, U, via PIRLS - evaluate an approximation to the integral of the conditional density I agree that the penalty is an important justification for random effects versus fixed effects. John Tukey put a positive spin on the situation by saying that we are "borrowing strength" from the other individuals in the sample when we use random effects. Consider fitting an item response model to a dichotomous response (say correct/incorrect) where each response is associated with a subject and an item (e.g. a question on an exam). Often such models are fit using fixed-effects for the subject and random effects for the item, in which case there is no finite estimate of the fixed-effect for a subject who gets all the items correct. It we model both subjects and items with random effects, the penalty or shrinkage brings the extremes of the estimates for the subjects in closer to the population mean. I hope this helps.
On Mon, Apr 4, 2016 at 6:22 AM Paul Johnson <pauljohn32 at gmail.com> wrote:
I'm trying to explain GLMM estimation and defend random effects in an audience of econometricians. I am focused on logit models, mostly, with random intercepts. The literature is difficult. I can understand applications and overviews like the paper by Bolker et al in Trends in Ecology and Evolution. But I can't understand much about glmer that is deeper than that. Can you point me at some books/articles that explain the GLMM estimation process in an understandable way? Is there a PLS derivation for GLMM? I want to better understand adaptive quadrature. And Laplace approximation. You might be able to advise me better if I tell you why I need to know. I was surprised to learn that economists hate random effects. It is almost visceral. For the economists, the fixed vs random effects debate is not philosophical, but rather practical. In an LMM, the Hausman test seems to bluntly reject almost all random effects models. (See William Greene's Econometrics book). Unmeasured group-level predictors always exist, it seems, so random effects estimates are biased/inconsistent. Even if you believe Intercept differences are random, LMM estimates are biased/inconsistent, so you should treat as fixed. I'm a little surprised there is so little discussion of Hausman's test in the random effect literature outside economics. One argument used by random effect advocates, that group-level predictors can be included in LMM, holds no weight at all. It is just evidence of bias in LMM. Well, the estimates thus obtained are useless because, if the group level intercept estimates were correct, then the group-level predictors would not be identifiable. My argument with them so far is based on the characterization of LMM as a PLS exercise, which I learned in this email list. That makes a point obvious: the fixed vs random models differ because PLS penalizes the b's, but fixed estimators do not. The issue is not "random" against "fixed". It is penalized against unpenalized. The parallel between LMM and ridge regression and LASSO helps. If the number of observations within groups grows, then the posterior modes and the fixed effect estimates converge. Yes? The small sample debate hinges on mean square error of the b's. The PLS view makes it plain that Empirical Bayes gives shrinkage not as afterthought (as it seems in the GLS narrative), but as a primary element (Henderson's estimator). Ironically, the shrinkage effect of LMM, widely praised in stats and hierarchical modeling applications, raises suspicion of bias. One might prefer a biased, but lower variance estimate of b, and that's shrinkage. That's my theme, anyway, we'll see if I can sell it. In that context, I come to the chore of comparing a glmer estimate with a fixed effect method known as conditional logit or Chamberlain's panel logit model. pj Paul Johnson http://pj.freefaculty.org [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models