Skip to content

MCMCglmm: predictions for posterior predictive checks

4 messages · Jarrod Hadfield, Ramon Diaz-Uriarte

#
Dear List,

I want to perform a simple posterior predictive check on some Poisson
models I've fitted using MCMCglmm. I immediately though about using
predict.MCMCglmm as

predict(mymodel, type = "response", marginal = 0, interval = "prediction")

However, predict returns the expectation (in fact one of its very last
lines have pred <- matrix(colMeans(post.pred), dim(post.pred)[2], 1)).


I can hack predict.MCMCglmm and directly return the "post.pred" object
which, IIUC, would give me the "y^{rep}" (in Gelman et al., 1996,
notation.).  But having to do this makes me wonder if I am understanding
this correctly.

Is directly using the "post.pred" object the right way of getting the
y^{rep} with MCMCglmm?


Best,


R.




P.S. I am using "marginal = 0" as I want what, e.g., Green et al., 2009
("Use of posterior predictive assessments to evaluate model fit in
multilevel logistic regression", Vet. Res, 40) call "full": "The predictive
distribution was conditional on all fixed effect and random effect
parameters estimated in the final model and a replicate observation
y_{ijk}^{full} generated from the conditional distribution (...)".
#
Hi,

post.pred is equivalent to y^{rep} but you have to be careful about  
whether you want to marginlaise the random effects or not. If you  
don't marginalise the random effects then post.pred is usually the  
joint posterior predictive distribution of y (where I take y to be a  
vector, and joint refers to the fact that it is the joint distribution  
of y^{rep}_{1},y^{rep}_{2} ...y^{rep}_{n}). However, post.pred always  
marginalises the residuals (or it wouldn't be a predictive  
distribution!).  If, for example, the model was bivariate with  
observations x and y, and R-structure of the form us(trait):units then  
post.pred does not give the joint posterior predictive distribution of  
x and y: it will give the joint posterior predictive distribution of x  
marginal to the residuals of y, and vice versa.

If the random effects are marginalised then post.pred is returning the  
marginal posterior predictive distribution for each observation. This  
might not always be what you want.  Imagine you make two observations  
on the same individuals at each of two time points, and you have the  
random effect structure us(time):individual.  This fits different  
between-individual variances for observations at different times, and  
also fits a between-individual across-time covariance. If you  
marginalise the individual effects (for example if you want to get a  
posterior predictive distribution for an observation made on a random  
individual) then post.pred will not preserve the covariation structure  
of the four observations. For example, have Y_ijk as the kth  
observation at time j for individual i. We have Y_i11, Y_i12 Y_i21,  
Y_i22.  Imagine us(time):individual gives us a covariance matrix V  
with 1 and 2 along the diagonal, and sqrt(1/2) on the off-diagonals  
(ie. a correlation of 0.5). If you marginalise the individual effects  
then the they are assumed to be drawn from a distribution with  
diagonal variance structure Diag{1,1,2,2}, whereas perhaps you would  
prefer them to be drawn from kronecker(V, diag(2)).

This is a bit hard to explain! If it doesn't make sense let me know,  
and I'll try again.

Cheers,

Jarrod














Cheers,

Jarrod


Quoting Ramon Diaz-Uriarte <rdiaz02 at gmail.com> on Sat, 01 Mar 2014  
12:44:04 +0100:

  
    
#
Hi,

Sorry, kronecker(V, diag(2)) should have been kronecker(V, matrix(1,2,2)).

Cheers,

Jarrod



Quoting Jarrod Hadfield <j.hadfield at ed.ac.uk> on Sat, 01 Mar 2014  
18:32:36 +0000:

  
    
#
Hi Jarrod,

Thanks for the detailed answer.  But let me make sure I follow (apologies
in advance if the questions seem dumb). It is basically double-checking.
On Sat, 01-03-2014, at 19:32, j.hadfield at ed.ac.uk wrote:
and "n" is the number of observations (NOT the number of iterations). In
terms of the code for predict.MCMCglmm, n is nrow(pred) or
nrow(post.pred). Is this correct?
Before we get to the bivariate, let me make sure I follow in two simpler
cases. First, in a univariate Gaussian, post.pred marginalises over the
"e". This seems intuitively obvious, as otherwise y^{rep}_{i} would be
identical to y^{rep}_{i} (for i constant) over all iterations.


Now, in a univariate Poisson (my case, actually), you also use a residual
"e" (p. 35 of the course notes or p. 2 of the paper). So post.pred also
marginalises over "e" here.  This fits my understanding of what the code
does (since the possible latent variables are never used, and marginal =
0 does work even if the model was fitted with "pl = FALSE").

However, the posterior predictions DO make use of the variance of "e" in
the Poisson case too. The variance of "e" (the units) needs to be used if
the posterior predictions incorporate the possible overdispersion. I
think I can see this in the code, e.g. when postvar is created as

postvar <- t(apply(object$VCV, 1, function(x) {
    x[pos.error[object$error.term]]
}))
Aha, thanks. Would it even be possible to think about doing it otherwise
using us(trait):units?  (Not that I am asking for it; I am just trying to
understand if it could thought of).
I had to pay a lot of attention, and I am not sure I got it fully right
(even after your second email, with the kronecker correction), but I think
I see the possible problem.




If I may, let me now go back to the posterior predictive problem.  For
simplicity suppose a single response and a single random effect (e.g.,
plot), so some of the complicating issues above do not apply.

To me it made most sense to obtain the discrepancy without marginalising
random effects (as in Green et al., 2009). But we might also obtain the
discrepancy after marginalising. I've checked in Gelman and Hill's (Data
analysis using regression ...) and Gelman et al. (Bayesian data analysis)
and it seems to me (emphasize the "seems to me") that they do the
equivalent of marginalising over the random effects (note that the
random/fixed effects terminology is not liked by Gelman, etc); that is what
I think I read in p. 524 and ff (sect. 24. 2) of Gelman and Hill and p.159
(sect. 6.5) and 148 and ff. in Gelman et al. (2013 edition). Any
comments/suggestions?



Thanks again for your explanations.


Best,


R.