From: rafter sass ferguson <liberationecology at gmail.com>
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] MCMCglmm predictions with new data for
multi-response model
Dear list,
I am hoping to generate predictions (with CIs) based on new data for a
model with
3 Gaussian response variables and 1 blocking variable.
A very similar and recent question hasn't received a substantive reply yet,
here: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q2/023616.html
.
Jarrod mentioned on this list some months ago that a version of MCMCglmm
that can predict with new data is imminent, but I haven't seen any signs of
it -
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q1/023329.html .
I believe I understand the process as discussed here -
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q1/020065.html
http://stackoverflow.com/questions/21057548/how-can-one-simulate-quantities-of-interest-from-the-posterior-density-in-mcmcgl
,
http://glmm.wikidot.com/faq
Here is my working understanding of the process:
(1) Generate new data / prediction frame -> pred_frame
(2) Generate fixed effects design matrix -> fix_dm
(3) Extract chains of coefficients from Sol -> beta
(4) Matrix-multiply fix_dm * beta -> temp
(5) Extract variance-covariance matrix as rowsums of mod$VCV -> V
(6) Divide temp by sqrt(V) to get a matrix of predictions [i, j] with i =
MCMC sample and j= new data point -> pred_mat
(7) Generate CIs for predictions as HPDs of columns of pred_mat
Here are my questions:
a. Most broadly - am I on the right track, or have I totally misunderstood
something critical?
b. For (4) above - the structure of model$Sol in the multi-response model
is complex and doesn't conform to the design matrix. I figure I need to
melt
the design matrix but I'm unsure how to ensure I'm doing that in just the
same way as MCMCglmm.
c. Some of the same unanswered questions discussed here:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q2/023616.html ,
- I suspect based on some of the discussions linked to above that the
procedure I outlined produces results that are marginalized over random
effects, but I'm not at all sure.
- What's the most appropriate way to generate CIs for predictions? If I'm
not wrong about (7), what's the best way to calculate those HPDs?
Here is the model I'm working with:
prior2 <- list(R=list(V=diag(3),nu=3), G=list(G1=list(V=diag(3),
nu=3.02, alpha.mu=rep(0,3),
alpha.V=1000*diag(3)), G2=list(V=diag(3), nu=3.02, alpha.mu=rep(0,3),
alpha.V=1000*diag(3) ) ) )
m2b <- MCMCglmm(cbind(professional, relational, practice) ~
-1 + trait +
trait:income + trait:age + trait:ethnicity +
trait:gender + trait:residence + trait:education + trait:HDI +
trait:Ineq + trait:Enviro + trait:Enviro:gender + trait:Ineq:gender +
trait:Ineq:ethnicity + trait:Ineq:income,
random= ~us(trait):block + us(trait):ID, rcov=
~us(trait):units,
family=rep("gaussian",3),
prior=prior2,
nitt <- 80000, thin <- 25, burnin <- 50000,
data=df_sel,
verbose=TRUE)
Thank you immensely much for any help!
Warmly,
Rafter