prediction intervals and lmer

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130828/4c8508b4/attachment.pl>
Muldoon, Ariel <Ariel.Muldoon at ...> writes:
Today I was thinking about prediction intervals, and 
happened to be look through the code for calculating
prediction intervals for lmer objects from the 
always handy http://glmm.wikidot.com/faq.  The way the
variance is calculated doesn't make sense to me, so 
I'm wondering if I'm missing something.

The code for calculating the variances from the wikidot site:
fm1 <- lmer(
    formula = distance ~ age*Sex + (age|Subject)
    , data = Orthodont
)
newdat <- expand.grid(
    age=c(8,10,12,14)
    , Sex=c("Male","Female")
    , distance = 0
)
mm <- model.matrix(terms(fm1),newdat)
newdat$distance <- mm %*% fixef(fm1)
pvar1 <- diag(mm %*% tcrossprod(vcov(fm1),mm))
tvar1 <- pvar1+VarCorr(fm1)$Subject[1]

It's this piece that I'm having trouble with:

VarCorr(fm1)$Subject[1].

This is the variance for the random intercept term.  
If I had been doing this on my own, I would have used the
residual variance (conditional variance of the response) 
in building prediction intervals.  I can pull
the residual variance out with

getME(fm1, "sigma")^2

or (uglier)

attr(VarCorr(fm1), "sc")^2.

Am I missing something fundamental about prediction 
intervals and mixed models?

If you look carefully at the FAQ code, you'll see that the 
lme and glmmADMB code do include residual variance terms.  The page
itself is a little bit vague about which computed values are
confidence intervals are confidence intervals and which are prediction
intervals; the values given for lme4 are confidence intervals,
incorporating either (1) only the uncertainty on the fixed-effect
parameters (beta) or (2) uncertainty on beta plus variation due
to random effects [this would be a sort-of confidence interval for 
a population-level prediction, i.e. the expected variation
(conditional on the random-effect parameter effects) of the mean
of a large number of samples from a _single_ previously unobserved
block].   The problem (or at least, my problem) with setting up
generic confidence/prediction intervals for mixed models is thinking
clearly about which sources of variation one wants to (1) ignore,
(2) condition on, (3) marginalize over ...  (Also, note that none
of these approaches allows for the uncertainty of the random-effects
parameters.)
Ah, I think I see.
confidence intervals, incorporating either (1) only the uncertainty on the fixed-effect parameters (beta) or (2) >uncertainty on beta plus variation due to random effects [this would be a sort-of confidence interval for a population->level prediction, i.e. the expected variation (conditional on the random-effect parameter effects) of the mean of a large >number of samples from a _single_ previously unobserved block].
Rather than making a confidence interval for (1) inference about the "typical" or "average" subject (if subject is your random effect) where we only account for the uncertainty around the estimated mean, we might want to (2) make inference to some unobserved subject.  We don't know where that new subject falls in the distribution of all subjects, so we account both for the uncertainty in the estimated mean and the additional uncertainty of the new subject when building the confidence interval.

Thanks for the clarification! 

_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models