Dear all, I'll try to keep this as brief as possible. Some time ago I asked about joint (linear mixed-effects) modelling of two correlated outcomes, of the form: m1 <- lmer(Y1 ~ age + X + (age | id), data=dat) m2 <- lmer(Y2 ~ age + X + (age | id), data=dat) Thierry Onkelinx very helpfully suggested the following: library(tidyr) long <- gather(dat, key = "trait", value = "Y", Y1, Y2) lmer(Y ~ 0 + trait + trait * (age + X) + (0 + trait:age | id), data = long) which was perfect and indeed worked as expected. Now, I am trying to draw predictions (+prediction intervals) from this model. I've read (among other stuff) the instructions on glmm.wikidot.com/faq, and also this past message: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020809.html Thus I have two questions: (1) If my interest is in predicting the response of a single unobserved individual, accounting for all the random effects (=marginalizing over the random effects, I think), which variances should I add together to construct my prediction interval? My understanding is that I would have to add (a) the variance due to the fixed coefficients (betas), (b) the variance due to the random effects, AND (c) the residual variance. Is this correct???? I've become rather confused because among the examples provided on the glmm.wikidot.com/faq page, I see the lme and glmmADMB examples add (a) + (c), ignoring the (b), while the lme4 example adds (a) + (b), ignoring the (c). Maybe some extra detail would help a lot. (2) Assuming that I do indeed need (a) + (b) + (c) to calculate prediction intervals from my above described model, for one outcome (trait) only. What if I want to calculate prediction intervals for the **difference** between my two modelled outcomes, for a single unobserved individual? Which variances and covariances should I account for?? Given that: Var(a*A + b*B) = a^2*Var(A) + b^2*Var(B) + 2*a*b*cov(A,B) , I can calculate (b), i.e. the variance due to the random effects, using the correlations between the random effects. Also I think I'll need to add **twice** the residual variance (c); is this correct?? What I'm more in doubt about is the (a), i.e. the variance due to the fixed effects. Don't I somehow have to account for the correlations between my fixed effects terms, and if so, how should I go about doing that?? An extension of this latest question: say I move away from lme4 and run this particular model in JAGS (in fact I did this already as an experiment), calculating (b) and (c), and thus the limits of my 95% PI (for the difference between the two outcomes), over the whole MCMC chain. Since integrating over the MCMC chain automatically estimates the uncertainty over any derived parameter (in this case, the difference between the outcomes), shouldn't this account for any correlation between my fixed effects terms as well? Thus problem solved?? Any help is greatly appreciated! Thanks in advance, Theodore Lytras Epidemiologist, PhD student Hellenic Centre for Disease Control and Prevention
Prediction interval for the difference of a pair of outcomes
1 message · Theodore Lytras