Skip to content

Choosing appropriate priors for bglmer mixed models in blme

5 messages · Josie Galbraith, Vincent Dorie, Jarrod Hadfield

#
Thanks Ben,
I didn't have problems with singular estimates of variance components with
this data set.  However, I have a few other pathogens/parasites that I'm
looking at (I'm running separate models for each), and after looking at all
of them some do have zero variances for the random effect, either in
addition to large parameter estimates or alongside reasonable parameter
estimates.
Should I be also be imposing a covariance prior in either of these cases?

As a related aside, my data are collected from individual birds - captured
over 4 sampling rounds (6 months apart).  While the majority of
observations are independent, there is a small proportion of birds that
were recaptured in a subsequent sampling round (between 2?15% of
observations, depending on which response variable).  I have modelled my
data both both with and without bird ID as a random effect.  Including it
seems to cause more problems with zero variances.  Is this because too few
of the birds have actually been resampled?

Cheers,
Josie

  
    
#
Hi Josie,

Regarding the priors on the fixed effects, if complete separation is  
the issue having a diffuse prior is not going to help. Gelman (2008)  
gives some recommendations about priors for logistic regression.  
Although a Cauchy-prior was considered better than a t-prior, the  
latter can be used in blmer and should alleviate complete separation  
issues. I tend to use a normal-prior after performing Gelman's  
rescaling, but this is mainly because MCMCglmm only handles normal  
priors for the fixed effects (this may not be true). In a hierarchical  
model I'm not sure Gelman's advice holds: at least with a normal-prior  
it makes sense to increase the prior variance as the random-effect  
variances increase. If the prior variance is approximately v+pi^2/3,  
where v is the sum of the variance components, then the effects on the  
probability scale are quite close to being uniform on the 0,1 interval.

You can use the gelman.prior function to obtain the prior covariance  
matrix for your model. However, note that in the help file I say that  
the scale argument takes the standard deviation. In fact it takes the  
variance, but in the next version of MCMCglmm (coming soon) I have  
fixed this and it will take the standard deviation.

Cheers,

Jarrod


Gelman, A. et al. (2008) The Annals of Appled Statistics 2 4 1360-1383


Quoting Josie Galbraith <josie.galbraith at gmail.com> on Sat, 7 Mar 2015  
12:15:41 +1300:

  
    
#
Just to follow up on Gelman's Cauchy prior, it seems to work quite well even in glmms. I don't have any theoretical results as of yet, but if you look at the sampling distribution of the fixed effects for any model, they cluster rather nicely. You get "sane" estimates for when no kind of separation is involved, infinite (or convergence failures) for complete/quasi complete separation, and a third group exists with large estimates for when a group contains all 0s or 1s. In the third case, a random effect can perfectly predict for that group, but because they're integrated out the likelihood remains well defined. You'll just get really large estimates of random effects, which then go with large estimates of fixed effects.

So long as you believe that some effect magnitudes for logistic regression pretty much never happen in nature, the Cauchy prior does a good job of pulling the extreme cases back down to earth while leaving the well-estimated ones roughly in place. That being said, using the priors in blme to patch up a data set is really only advised for checking the viability of a model (usually one among many, rapidly fit). After that, using something like MCMCglmm for a fully Bayesian analysis is the way to go.

Vince
#
Hi Vince,

For a given difference on the logit scale between (lets say) two  
treatment groups then the difference on the observed scale depends on  
the magnitude of the variance components. For logit effects beta1 and  
beta2, the expected difference is approximately:

plogis(beta1/sqrt(1+c2*v))-plogis(beta2/sqrt(1+c2*v))

where v is the variance component and c2 = (16*sqrt(3)/(15*pi))^2.

If a prior (Cauchy or otherwise) was set up that was invariant to v  
then it would imply different prior beliefs about the magnitude of the  
difference (on the observed scale) depending on v. For the normal  
prior it would imply that when v is large we should expect smaller  
differences between treatment groups. This maybe OK (I'm not sure) but  
if not is there a way to make it invariant for the t/Cauchy prior? For  
the normal you can make the scale = sqrt(v+pi^2/3) which seems to work  
OKish.

Cheers,

Jarrod




Quoting Vincent Dorie <vjd4 at nyu.edu> on Sat, 7 Mar 2015 09:47:40 -0500:

  
    
1 day later
#
Hi Jarrod,

I'm not familiar with those calculations. Are those for the MLE in a balanced, varying intercept model?

I guess the short answer is that it would be pretty easy to add this to blme (I think it can already be done, even). I am not 100% certain, but since the t/Cauchy is just a normal distribution with an unknown scale component it should be sufficient to use something proportional to your proposed scale.

The longer answer might be that the Cauchy prior is OK as is, since the implicit prior it imposes on the variance component keeps the random effects from exploiting separation between groups. Maybe it's a feature and not a bug? On the other hand, I suppose it is more philosophically clean to penalize both parameters directly. If one can make an argument that there is prior information about the range of fixed effects possible in a logistic regression, the same can be said for the random effects or their variance component. On the other, other hand, that's one more tuning parameter.

It looks as if folk are running blme when the MLE breaks down. Ideally, they would then move on to a fully Bayesian solution. In the event that they don't, the goal is then to provide posterior mode estimates under on prior that looks similar to posterior means under a "reference" prior, but also conditioned on the knowledge that something isn't quite right with the data. I'm very open to suggestions on how to best do this.

Vince