Today I was thinking about prediction intervals, and
happened to be look through the code for calculating
prediction intervals for lmer objects from the
always handy http://glmm.wikidot.com/faq. The way the
variance is calculated doesn't make sense to me, so
I'm wondering if I'm missing something.
The code for calculating the variances from the wikidot site:
fm1 <- lmer(
formula = distance ~ age*Sex + (age|Subject)
, data = Orthodont
)
newdat <- expand.grid(
age=c(8,10,12,14)
, Sex=c("Male","Female")
, distance = 0
)
mm <- model.matrix(terms(fm1),newdat)
newdat$distance <- mm %*% fixef(fm1)
pvar1 <- diag(mm %*% tcrossprod(vcov(fm1),mm))
tvar1 <- pvar1+VarCorr(fm1)$Subject[1]
It's this piece that I'm having trouble with:
VarCorr(fm1)$Subject[1].
This is the variance for the random intercept term.
If I had been doing this on my own, I would have used the
residual variance (conditional variance of the response)
in building prediction intervals. I can pull
the residual variance out with
getME(fm1, "sigma")^2
or (uglier)
attr(VarCorr(fm1), "sc")^2.
Am I missing something fundamental about prediction
intervals and mixed models?