MCMCglmm predictions with new data for multi-response model - R-SIG-mixed-models

Mon, Jul 6, 2015 2:54 PM #
Dear list,

I am hoping to generate predictions (with CIs) based on new data for a
model with
3 Gaussian response variables and 1 blocking variable.

A very similar and recent question hasn't received a substantive reply yet,
here: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q2/023616.html .
Jarrod mentioned on this list some months ago that a version of MCMCglmm
that can predict with new data is imminent, but I haven't seen any signs of
it -
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q1/023329.html .

I believe I understand the process as discussed here -
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q1/020065.html
http://stackoverflow.com/questions/21057548/how-can-one-simulate-quantities-of-interest-from-the-posterior-density-in-mcmcgl
,
http://glmm.wikidot.com/faq

Here is my working understanding of the process:
(1) Generate new data / prediction frame -> pred_frame
(2) Generate fixed effects design matrix -> fix_dm
(3) Extract chains of coefficients from Sol -> beta
(4) Matrix-multiply  fix_dm * beta -> temp
(5) Extract variance-covariance matrix as rowsums of mod$VCV -> V
(6) Divide temp by sqrt(V) to get a matrix of predictions [i, j] with i =
MCMC sample and j= new data point -> pred_mat
(7) Generate CIs for predictions as HPDs of columns of pred_mat

Here are my questions:
a. Most broadly - am I on the right track, or have I totally misunderstood
something critical?

b. For (4) above - the structure of model$Sol in the multi-response model
is complex and doesn't conform to the design matrix. I figure I need to melt
the design matrix but I'm unsure how to ensure I'm doing that in just the
same way as MCMCglmm.

c. Some of the same unanswered questions discussed here:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q2/023616.html  ,
- I suspect based on some of the discussions linked to above that the
procedure I outlined produces results that are marginalized over random
effects, but I'm not at all sure.
- What's the most appropriate way to generate CIs for predictions? If I'm
not wrong about (7), what's the best way to calculate those HPDs?

Here is the model I'm working with:

prior2 <- list(R=list(V=diag(3),nu=3), G=list(G1=list(V=diag(3),
nu=3.02, alpha.mu=rep(0,3),
alpha.V=1000*diag(3)), G2=list(V=diag(3), nu=3.02, alpha.mu=rep(0,3),
alpha.V=1000*diag(3) ) ) )

m2b <- MCMCglmm(cbind(professional, relational, practice) ~
                 -1 + trait +
                 trait:income + trait:age + trait:ethnicity +
trait:gender + trait:residence + trait:education + trait:HDI +
trait:Ineq + trait:Enviro + trait:Enviro:gender + trait:Ineq:gender +
trait:Ineq:ethnicity + trait:Ineq:income,
               random= ~us(trait):block + us(trait):ID, rcov= ~us(trait):units,
               family=rep("gaussian",3),
               prior=prior2,
               nitt <- 80000, thin <- 25, burnin <- 50000,
               data=df_sel,
               verbose=TRUE)

Thank you immensely much for any help!

Warmly,

Rafter




















Rafter Sass Ferguson, MS
PhD Candidate | Crop Sciences Department
University of Illinois in Urbana-Champaign
liberationecology.org
518 567 7407