List,
A colleague recently asked me some questions about issues with
estimating repeatabilities for binomial/binary data. In trying to help
out I started playing around with simulating some data and have been
struggling to get decent effective sample sizes even when parameters are
being well estimated. I suspect the problem is partly the priors
(inverse-gamma) but have had issues with other priors too (priors as
specified in MCMCglmm are still quite unclear to me in general, however).
While quite often repeatability gets estimated very well despite
effective sample sizes of only a few dozen I also am getting poorly
estimated repeatabilities (~0) with a population level repeatability of
<.4 and sample sizes less than n=100 and reps=2, suggesting the
inverse-gama being informative around zero. Is the poor mixing
potentially due to using multinomial2 rather than categorical? I don't
want to go the categorical route because repeatabilities are of
particular interest.
I'm much less comfortable with generalized rather than standard linear
models so I wonder if there are any glaring issues with the coding. Any
thoughts on what might be going on?
Thanks for any help.
Ned
#make function to calculate inverse logit:
inv.log<- function(x){
exp(x)/(1+exp(x))}
#specify some start conditions and bookkeeping stuff
#specify data distributions:
repeatability=0.3 #Change this for other repeatabilities (sampled
population's repeatability, not sample's realized latent repeatability)
sigma2.e=6 #arbitrary, set repeatability above
sigma2.a=(repeatability*sigma2.e+repeatability*((pi^2)/3))/(1-repeatability)
B0=0 #could do different fixed effects with some more work...
#(n=number of individuals, i=number of replications per individual)
counter=0; n=150; i=4
ind.data=matrix(0,n*i,3) #make place to store data
tru.lat=matrix(0,n*i,2) #store sampled underlying scores
for(j in 1:n){
alpha.j=rnorm(1,0,sigma2.a)
for(rep in 1:i){
counter=counter+1
ind.data[counter,1]=j
eij=rnorm(1,0,sigma2.e)
pij=inv.log((B0+alpha.j+eij))
ind.data[counter,3]=rbinom(1,1,pij)
ind.data[counter,2]=1-ind.data[counter,3]
tru.lat[counter,1]=alpha.j
tru.lat[counter,2]=eij
}}
ind.data=as.data.frame(ind.data)
names(ind.data)=c("Ind", "Fail", "Success")
prior1 = list(R = list(V = 1, nu=0.002),
G = list(G1 = list(V = 1, nu = 0.002)))
sim.mcmc<-MCMCglmm(cbind(Fail,Success)~1, random= ~Ind,
family="multinomial2", prior=prior1,
nitt = 260000, thin = 200, burnin = 60000,
verbose=FALSE,data=ind.data)
#estimated repeatabilities
posterior.mode(sim.mcmc$VCV[,1]/(sim.mcmc$VCV[,1]+sim.mcmc$VCV[,2]+((pi^2)/3)))
HPDinterval(sim.mcmc$VCV[,1]/(sim.mcmc$VCV[,1]+sim.mcmc$VCV[,2]+((pi^2)/3)))
#effective sample sizes, tells you if model is mixing well
summary(sim.mcmc)[6]$Gcovariances[4]
summary(sim.mcmc)[8]$Rcovariances[4]
#sample repeatability, not sure this is quite right
tru.a2=var(tru.lat[1:(n/i)*i,1])
tru.e2=var(tru.lat[,2])
tru.a2/(tru.a2+tru.e2+((pi^2)/3))
Ned A. Dochtermann
Assistant Professor / Department of Biological Sciences
NORTH DAKOTA STATE UNIVERSITY
p: 701.231.7353 / f: 701.231.7149 / www.ndsu.edu
https://sites.google.com/site/neddochtermann/
ned.dochtermann at ndsu.edu
Hi,
The residual variance of a binary response cannot be estimated, so use
prior1 = list(R = list(V = 1, fix=1),
G = list(G1 = list(V = 1, nu = 0.002)))
In this example it is more efficient to aggregate success/failures of
an individual into a multi-trial binomial response and use:
prior2 = list(R = list(V = 1, nu=0.002))
sim.mcmc2<-MCMCglmm(cbind(Fail,Success)~1,
family="multinomial2", prior=prior2,
nitt = 260000, thin = 200, burnin = 60000,
verbose=FALSE,data=ind.data)
sim.mcmc2$VCV/(sim.mcmc2$VCV+pi^2/3)
Cheers,
Jarrod
Quoting Ned Dochtermann <ned.dochtermann at gmail.com> on Tue, 21 Oct
2014 14:44:19 -0500:
List,
A colleague recently asked me some questions about issues with
estimating repeatabilities for binomial/binary data. In trying to
help out I started playing around with simulating some data and have
been struggling to get decent effective sample sizes even when
parameters are being well estimated. I suspect the problem is partly
the priors (inverse-gamma) but have had issues with other priors too
(priors as specified in MCMCglmm are still quite unclear to me in
general, however).
While quite often repeatability gets estimated very well despite
effective sample sizes of only a few dozen I also am getting poorly
estimated repeatabilities (~0) with a population level repeatability
of <.4 and sample sizes less than n=100 and reps=2, suggesting the
inverse-gama being informative around zero. Is the poor mixing
potentially due to using multinomial2 rather than categorical? I
don't want to go the categorical route because repeatabilities are
of particular interest.
I'm much less comfortable with generalized rather than standard
linear models so I wonder if there are any glaring issues with the
coding. Any thoughts on what might be going on?
Thanks for any help.
Ned
#make function to calculate inverse logit:
inv.log<- function(x){
exp(x)/(1+exp(x))}
#specify some start conditions and bookkeeping stuff
#specify data distributions:
repeatability=0.3 #Change this for other repeatabilities (sampled
population's repeatability, not sample's realized latent
repeatability)
sigma2.e=6 #arbitrary, set repeatability above
sigma2.a=(repeatability*sigma2.e+repeatability*((pi^2)/3))/(1-repeatability)
B0=0 #could do different fixed effects with some more work...
#(n=number of individuals, i=number of replications per individual)
counter=0; n=150; i=4
ind.data=matrix(0,n*i,3) #make place to store data
tru.lat=matrix(0,n*i,2) #store sampled underlying scores
for(j in 1:n){
alpha.j=rnorm(1,0,sigma2.a)
for(rep in 1:i){
counter=counter+1
ind.data[counter,1]=j
eij=rnorm(1,0,sigma2.e)
pij=inv.log((B0+alpha.j+eij))
ind.data[counter,3]=rbinom(1,1,pij)
ind.data[counter,2]=1-ind.data[counter,3]
tru.lat[counter,1]=alpha.j
tru.lat[counter,2]=eij
}}
ind.data=as.data.frame(ind.data)
names(ind.data)=c("Ind", "Fail", "Success")
prior1 = list(R = list(V = 1, nu=0.002),
G = list(G1 = list(V = 1, nu = 0.002)))
sim.mcmc<-MCMCglmm(cbind(Fail,Success)~1, random= ~Ind,
family="multinomial2", prior=prior1,
nitt = 260000, thin = 200, burnin = 60000,
verbose=FALSE,data=ind.data)
#estimated repeatabilities
posterior.mode(sim.mcmc$VCV[,1]/(sim.mcmc$VCV[,1]+sim.mcmc$VCV[,2]+((pi^2)/3)))
HPDinterval(sim.mcmc$VCV[,1]/(sim.mcmc$VCV[,1]+sim.mcmc$VCV[,2]+((pi^2)/3)))
#effective sample sizes, tells you if model is mixing well
summary(sim.mcmc)[6]$Gcovariances[4]
summary(sim.mcmc)[8]$Rcovariances[4]
#sample repeatability, not sure this is quite right
tru.a2=var(tru.lat[1:(n/i)*i,1])
tru.e2=var(tru.lat[,2])
tru.a2/(tru.a2+tru.e2+((pi^2)/3))
--
Ned A. Dochtermann
Assistant Professor / Department of Biological Sciences
NORTH DAKOTA STATE UNIVERSITY
p: 701.231.7353 / f: 701.231.7149 / www.ndsu.edu
https://sites.google.com/site/neddochtermann/
ned.dochtermann at ndsu.edu
Thanks, I was aware of that for categorical and some of the other
families but thought I could get away with it here and I wasn't quite
sure otherwise how to calculate the relevant ratio (thanks for providing
that too).
With smaller sample sizes repeatability still seems to get misestimated
and stuck close to zero even with long runs but running multiple chains
seem to resolve that.
Thanks again,
Ned
On 10/21/2014 3:05 PM, Jarrod Hadfield wrote:
Hi,
The residual variance of a binary response cannot be estimated, so use
prior1 = list(R = list(V = 1, fix=1),
G = list(G1 = list(V = 1, nu = 0.002)))
In this example it is more efficient to aggregate success/failures of
an individual into a multi-trial binomial response and use:
prior2 = list(R = list(V = 1, nu=0.002))
sim.mcmc2<-MCMCglmm(cbind(Fail,Success)~1,
family="multinomial2", prior=prior2,
nitt = 260000, thin = 200, burnin = 60000,
verbose=FALSE,data=ind.data)
sim.mcmc2$VCV/(sim.mcmc2$VCV+pi^2/3)
Cheers,
Jarrod
Hi,
The simulation only had one trial so it was equivalent to categorical.
If you up the number of trials then you can estimate the
observation-level variance.
You might try parameter expanded priors to remedy the last problem.
Cheers,
Jarrod
Quoting Ned Dochtermann <ned.dochtermann at gmail.com> on Tue, 21 Oct
2014 17:00:12 -0500:
Thanks, I was aware of that for categorical and some of the other
families but thought I could get away with it here and I wasn't
quite sure otherwise how to calculate the relevant ratio (thanks for
providing that too).
With smaller sample sizes repeatability still seems to get
misestimated and stuck close to zero even with long runs but running
multiple chains seem to resolve that.
Thanks again,
Ned
On 10/21/2014 3:05 PM, Jarrod Hadfield wrote:
Hi,
The residual variance of a binary response cannot be estimated, so use
prior1 = list(R = list(V = 1, fix=1),
G = list(G1 = list(V = 1, nu = 0.002)))
In this example it is more efficient to aggregate success/failures
of an individual into a multi-trial binomial response and use:
prior2 = list(R = list(V = 1, nu=0.002))
sim.mcmc2<-MCMCglmm(cbind(Fail,Success)~1,
family="multinomial2", prior=prior2,
nitt = 260000, thin = 200, burnin = 60000,
verbose=FALSE,data=ind.data)
sim.mcmc2$VCV/(sim.mcmc2$VCV+pi^2/3)
Cheers,
Jarrod
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.