Skip to content

trouble getting data scale predictions from a binomial MCMCglmm model

4 messages · Kari Ruohonen, Jarrod Hadfield

#
Hi,
I have fitted a binomial model (logit link) with MCMCglmm by fixing the
unit variance to 1 as discussed in the CourseNotes of MCMCglmm.

Now, I would like to get predictions on the data scale. I know there is
a predict function that does this but I need to be able to predict
outside R. With the usual X%*%b where X is the model matrix and b the
vector of coefficients I get predictions on the link scale. The
predictions I get match those from the predict.MCMCglmm method with the
type="terms" option. However, I cannot figure out how to get the
predictions on the data scale. CourseNotes discuss and give an example
of scaling the coefficients suitably with c2 <- (16*sqrt(3)/(15*pi))^2
and then dividing with (sqrt(1+c2*residual var)) to obtain estimates
under residual variance =0 but I cannot figure out how to apply this to
the case of predictions.

Here is a small example based on one in the CourseNotes:

treatment<-gl(2,25)
y<-rbinom(50,1,c(.5,.001)[treatment])
data.bin<-data.frame(treatment=treatment,y=y)
prior.m2c.3<-list(R=list(V=1,fix=1),G=list(G1=list(V=1,nu=.002)))
m2c.3<-MCMCglmm(y~treatment,data=data.bin,family="categorical",prior=prior.m2c.3,verbose=FALSE)

# the next predictions correspond to those from predict.MCMCglmm with "
# type="terms"
preds.link<-m2c.3$X%*%summary(m2c.3$Sol)$statistics[,1]

# but these do not correspond those with type="response"
preds.data<-plogis(preds.link[,1]/sqrt(1+c2*1))

I am obviously missing something and would be grateful to someone
helping me out.

Many thanks, Kari
#
Dear Kari,

The default in the predict function is to obtain the expectation after  
marginalising the random effects, hence you should be multiplying c2  
by the sum of the variance components (if the random structure is a  
simple grouping) if you want to obtain the same answer. Alternatively,  
you could specify marginal=NULL in the call to predict and this should  
be the same as your preds.data.

See the posts for https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q1/005575.html 
  also.

Cheers,

Jarrod
On 23 Mar 2011, at 12:19, Kari Ruohonen wrote:

            

  
    
#
Hi Jarrod and thanks for a quick reply.

But the example I gave does not have random effects so the sum of
variance components is 1 as used in my calculation:
Iterations = 3001:12991
Thinning interval = 10 
Number of chains = 1 
Sample size per chain = 1000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

          Mean             SD       Naive SE Time-series SE 
             1              0              0              0 

2. Quantiles for each variable:

 2.5%   25%   50%   75% 97.5% 
    1     1     1     1     1 

And with marginal=NULL the predictions are the same as without it in
this case. So, I am still not sure I got it. The predict.MCMCglmm seems
to do something differently than my direct model matrix & coefficients
approach in the example below.

regards, Kari
On Wed, 2011-03-23 at 12:48 +0000, Jarrod Hadfield wrote:
#
Hi Kari,

Sorry, I saw your prior specification with a G-structure and presumed  
you'd fitted a random effect. Your method is taking a point estimate  
of the fixed effects from summary, whereas the predict function uses  
the complete posterior.  This should explain the discrepancy.

In many replicates you will have complete separation (since p=0.001  
for 25 the data points associated with treatment 2) or something close  
to this.   This shouldn't effect the consistency of getting the  
predictions in the two ways, but you need to be very careful about  
under/overflow and priors. In this example I would set the prior  
variance on the fixed effects to something like pi^2/3+1 or pi^2/3+2.

Cheers,

Jarrod
On 23 Mar 2011, at 13:15, Kari Ruohonen wrote: