Skip to content

Binomial model variance and repeatability estimates with MCMCglmm

4 messages · Ned Dochtermann, Jarrod Hadfield

#
List,
A colleague recently asked me some questions about issues with 
estimating repeatabilities for binomial/binary data. In trying to help 
out I started playing around with simulating some data and have been 
struggling to get decent effective sample sizes even when parameters are 
being well estimated. I suspect the problem is partly the priors 
(inverse-gamma) but have had issues with other priors too (priors as 
specified in MCMCglmm are still quite unclear to me in general, however).

While quite often repeatability gets estimated very well despite 
effective sample sizes of only a few dozen I also am getting poorly 
estimated repeatabilities (~0) with a population level repeatability of 
<.4 and sample sizes less than n=100 and reps=2, suggesting the 
inverse-gama being informative around zero. Is the poor mixing 
potentially due to using multinomial2 rather than categorical? I don't 
want to go the categorical route because repeatabilities are of 
particular interest.

I'm much less comfortable with generalized rather than standard linear 
models so I wonder if there are any glaring issues with the coding. Any 
thoughts on what might be going on?

Thanks for any help.
Ned



#make function to calculate inverse logit:
inv.log<- function(x){
   exp(x)/(1+exp(x))}

#specify some start conditions and bookkeeping stuff

#specify data distributions:
repeatability=0.3 #Change this for other repeatabilities (sampled 
population's repeatability, not sample's realized latent repeatability)
sigma2.e=6 #arbitrary, set repeatability above
sigma2.a=(repeatability*sigma2.e+repeatability*((pi^2)/3))/(1-repeatability)
B0=0 #could do different fixed effects with some more work...

#(n=number of individuals, i=number of replications per individual)
counter=0; n=150; i=4
ind.data=matrix(0,n*i,3) #make place to store data
tru.lat=matrix(0,n*i,2) #store sampled underlying scores

for(j in 1:n){
   alpha.j=rnorm(1,0,sigma2.a)
   for(rep in 1:i){
     counter=counter+1
     ind.data[counter,1]=j
     eij=rnorm(1,0,sigma2.e)
     pij=inv.log((B0+alpha.j+eij))
     ind.data[counter,3]=rbinom(1,1,pij)
     ind.data[counter,2]=1-ind.data[counter,3]
     tru.lat[counter,1]=alpha.j
     tru.lat[counter,2]=eij
   }}

ind.data=as.data.frame(ind.data)
names(ind.data)=c("Ind", "Fail", "Success")

prior1 = list(R = list(V = 1, nu=0.002),
               G = list(G1 = list(V = 1, nu = 0.002)))
sim.mcmc<-MCMCglmm(cbind(Fail,Success)~1, random= ~Ind,
                    family="multinomial2", prior=prior1,
                    nitt = 260000, thin = 200, burnin = 60000,
                    verbose=FALSE,data=ind.data)

#estimated repeatabilities
posterior.mode(sim.mcmc$VCV[,1]/(sim.mcmc$VCV[,1]+sim.mcmc$VCV[,2]+((pi^2)/3)))
HPDinterval(sim.mcmc$VCV[,1]/(sim.mcmc$VCV[,1]+sim.mcmc$VCV[,2]+((pi^2)/3)))
#effective sample sizes, tells you if model is mixing well
summary(sim.mcmc)[6]$Gcovariances[4]
summary(sim.mcmc)[8]$Rcovariances[4]

#sample repeatability, not sure this is quite right
tru.a2=var(tru.lat[1:(n/i)*i,1])
tru.e2=var(tru.lat[,2])
tru.a2/(tru.a2+tru.e2+((pi^2)/3))
#
Hi,

The residual variance of a binary response cannot be estimated, so use

prior1 = list(R = list(V = 1, fix=1),
               G = list(G1 = list(V = 1, nu = 0.002)))

In this example it is more efficient to aggregate success/failures of  
an individual into a multi-trial binomial response and use:

  prior2 = list(R = list(V = 1, nu=0.002))

sim.mcmc2<-MCMCglmm(cbind(Fail,Success)~1,
                     family="multinomial2", prior=prior2,
                     nitt = 260000, thin = 200, burnin = 60000,
                     verbose=FALSE,data=ind.data)

sim.mcmc2$VCV/(sim.mcmc2$VCV+pi^2/3)

Cheers,

Jarrod


Quoting Ned Dochtermann <ned.dochtermann at gmail.com> on Tue, 21 Oct  
2014 14:44:19 -0500:

  
    
#
Thanks, I was aware of that for categorical and some of the other 
families but thought I could get away with it here and I wasn't quite 
sure otherwise how to calculate the relevant ratio (thanks for providing 
that too).
With smaller sample sizes repeatability still seems to get misestimated 
and stuck close to zero even with long runs but running multiple chains 
seem to resolve that.

Thanks again,
Ned
On 10/21/2014 3:05 PM, Jarrod Hadfield wrote:
#
Hi,

The simulation only had one trial so it was equivalent to categorical.  
If you up the number of trials then you can estimate the  
observation-level variance.

You might try parameter expanded priors to remedy the last problem.

Cheers,

Jarrod


Quoting Ned Dochtermann <ned.dochtermann at gmail.com> on Tue, 21 Oct  
2014 17:00:12 -0500: