An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130828/4c8508b4/attachment.pl>
prediction intervals and lmer
3 messages · Muldoon, Ariel, Ben Bolker
1 day later
Muldoon, Ariel <Ariel.Muldoon at ...> writes:
Today I was thinking about prediction intervals, and happened to be look through the code for calculating prediction intervals for lmer objects from the always handy http://glmm.wikidot.com/faq. The way the variance is calculated doesn't make sense to me, so I'm wondering if I'm missing something. The code for calculating the variances from the wikidot site: fm1 <- lmer( formula = distance ~ age*Sex + (age|Subject) , data = Orthodont ) newdat <- expand.grid( age=c(8,10,12,14) , Sex=c("Male","Female") , distance = 0 ) mm <- model.matrix(terms(fm1),newdat) newdat$distance <- mm %*% fixef(fm1) pvar1 <- diag(mm %*% tcrossprod(vcov(fm1),mm)) tvar1 <- pvar1+VarCorr(fm1)$Subject[1] It's this piece that I'm having trouble with: VarCorr(fm1)$Subject[1]. This is the variance for the random intercept term. If I had been doing this on my own, I would have used the residual variance (conditional variance of the response) in building prediction intervals. I can pull the residual variance out with getME(fm1, "sigma")^2 or (uglier) attr(VarCorr(fm1), "sc")^2. Am I missing something fundamental about prediction intervals and mixed models?
If you look carefully at the FAQ code, you'll see that the lme and glmmADMB code do include residual variance terms. The page itself is a little bit vague about which computed values are confidence intervals are confidence intervals and which are prediction intervals; the values given for lme4 are confidence intervals, incorporating either (1) only the uncertainty on the fixed-effect parameters (beta) or (2) uncertainty on beta plus variation due to random effects [this would be a sort-of confidence interval for a population-level prediction, i.e. the expected variation (conditional on the random-effect parameter effects) of the mean of a large number of samples from a _single_ previously unobserved block]. The problem (or at least, my problem) with setting up generic confidence/prediction intervals for mixed models is thinking clearly about which sources of variation one wants to (1) ignore, (2) condition on, (3) marginalize over ... (Also, note that none of these approaches allows for the uncertainty of the random-effects parameters.)
Ah, I think I see.
confidence intervals, incorporating either (1) only the uncertainty on the fixed-effect parameters (beta) or (2) >uncertainty on beta plus variation due to random effects [this would be a sort-of confidence interval for a population->level prediction, i.e. the expected variation (conditional on the random-effect parameter effects) of the mean of a large >number of samples from a _single_ previously unobserved block].
Rather than making a confidence interval for (1) inference about the "typical" or "average" subject (if subject is your random effect) where we only account for the uncertainty around the estimated mean, we might want to (2) make inference to some unobserved subject. We don't know where that new subject falls in the distribution of all subjects, so we account both for the uncertainty in the estimated mean and the additional uncertainty of the new subject when building the confidence interval. Thanks for the clarification! _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models