List,
A colleague recently asked me some questions about issues with
estimating repeatabilities for binomial/binary data. In trying to
help out I started playing around with simulating some data and have
been struggling to get decent effective sample sizes even when
parameters are being well estimated. I suspect the problem is partly
the priors (inverse-gamma) but have had issues with other priors too
(priors as specified in MCMCglmm are still quite unclear to me in
general, however).
While quite often repeatability gets estimated very well despite
effective sample sizes of only a few dozen I also am getting poorly
estimated repeatabilities (~0) with a population level repeatability
of <.4 and sample sizes less than n=100 and reps=2, suggesting the
inverse-gama being informative around zero. Is the poor mixing
potentially due to using multinomial2 rather than categorical? I
don't want to go the categorical route because repeatabilities are
of particular interest.
I'm much less comfortable with generalized rather than standard
linear models so I wonder if there are any glaring issues with the
coding. Any thoughts on what might be going on?
Thanks for any help.
Ned
#make function to calculate inverse logit:
inv.log<- function(x){
exp(x)/(1+exp(x))}
#specify some start conditions and bookkeeping stuff
#specify data distributions:
repeatability=0.3 #Change this for other repeatabilities (sampled
population's repeatability, not sample's realized latent
repeatability)
sigma2.e=6 #arbitrary, set repeatability above
sigma2.a=(repeatability*sigma2.e+repeatability*((pi^2)/3))/(1-repeatability)
B0=0 #could do different fixed effects with some more work...
#(n=number of individuals, i=number of replications per individual)
counter=0; n=150; i=4
ind.data=matrix(0,n*i,3) #make place to store data
tru.lat=matrix(0,n*i,2) #store sampled underlying scores
for(j in 1:n){
alpha.j=rnorm(1,0,sigma2.a)
for(rep in 1:i){
counter=counter+1
ind.data[counter,1]=j
eij=rnorm(1,0,sigma2.e)
pij=inv.log((B0+alpha.j+eij))
ind.data[counter,3]=rbinom(1,1,pij)
ind.data[counter,2]=1-ind.data[counter,3]
tru.lat[counter,1]=alpha.j
tru.lat[counter,2]=eij
}}
ind.data=as.data.frame(ind.data)
names(ind.data)=c("Ind", "Fail", "Success")
prior1 = list(R = list(V = 1, nu=0.002),
G = list(G1 = list(V = 1, nu = 0.002)))
sim.mcmc<-MCMCglmm(cbind(Fail,Success)~1, random= ~Ind,
family="multinomial2", prior=prior1,
nitt = 260000, thin = 200, burnin = 60000,
verbose=FALSE,data=ind.data)
#estimated repeatabilities
posterior.mode(sim.mcmc$VCV[,1]/(sim.mcmc$VCV[,1]+sim.mcmc$VCV[,2]+((pi^2)/3)))
HPDinterval(sim.mcmc$VCV[,1]/(sim.mcmc$VCV[,1]+sim.mcmc$VCV[,2]+((pi^2)/3)))
#effective sample sizes, tells you if model is mixing well
summary(sim.mcmc)[6]$Gcovariances[4]
summary(sim.mcmc)[8]$Rcovariances[4]
#sample repeatability, not sure this is quite right
tru.a2=var(tru.lat[1:(n/i)*i,1])
tru.e2=var(tru.lat[,2])
tru.a2/(tru.a2+tru.e2+((pi^2)/3))
--
Ned A. Dochtermann
Assistant Professor / Department of Biological Sciences
NORTH DAKOTA STATE UNIVERSITY
p: 701.231.7353 / f: 701.231.7149 / www.ndsu.edu
https://sites.google.com/site/neddochtermann/
ned.dochtermann at ndsu.edu