Tommaso Jucker <tommasojucker at ...> writes:
Dear all, I have the following mixed effects model with two crossed random effects which I am using to model tree growth: Model <- lmer (log(Growth) ~ log(Size) + log(Competition) + (1 + log(Size) | Treatment) + (1 | Plot)) I am using this model to predict new data, and was glad to see that the development version of lme4 now comes with a predict function that allows both fixed and random effects to be used to generate predictions. However I also need to be able to estimate the uncertainty in the predictions I am making, which is a problem since predict () in lme4 doesn't generate SE for predictions.
The reason this feature is missing is that it's difficult, without doing full-on MCMC or parametric bootstrapping, to generate standard errors that appropriately incorporate all sources of error (in particular the uncertainty in the random-effects parameters). Furthermore, which components of the error are included depends in a complicated way on which components of the model are conditioned on in generating the prediction. For example, if predictions are generated at the population level, setting random effects to zero, then (maybe?) the random effects variance should be incorporated in the prediction error. Getting all of this right is quite tricky.
I have tried to alternative approached. The first was to use simulate() to generate a distribution of predicted values which I could then summarize into uncertainty estimates. However, I found that the output of simulate() is clearly different from that of predict(), regardless of how I treated the use.u argument relating to random effects. When I take the mean predicted value across 1000 or more simulations and compare it to the output of predict(), it is clear that the two methods are producing different results.
Congratulations, you found a big bug in use.u=TRUE. If you reinstall from github and use use.u=TRUE, you can get results whose mean agrees with predict(). If you use use.u=FALSE and take the standard deviations you can get the uncertainty of population-level predictions due to random effects and residual error. This approach also generalizes to GLMMs. This approach will get you predictions that incorporate uncertainty due *only* to stochasticity (not to uncertainty), which is probably not what you want. Another approach is given in http://glmm.wikidot.com/faq ... this gets variation due to the fixed-effect uncertainty only -- although for LMMs you can add residual error fairly easily.
The second approach was to use the bootMer function as recommended in the help file for predict(). From this I was able to obtain SE for the parameter estimates. However, I'm not quite sure how to then translate these into uncertainty in the predictions (i.e., how do I obtain SEs for the predicted values?). Am I missing something obvious?
You need to vary the FUN argument. If you pass FUN=predict, you should get results that incorporate uncertainty in all model components. If you pass FUN=simulate, you should get prediction errors ... Ben Bolker