lmer: parametric bootstrap versus mcmcsamp

Wed, Sep 4, 2013 9:31 PM

Thanks for the comments Ben. Additional simulations with 2-3 
hierarchical levels give stronger evidence that parametric bootstrap is 
doing a good job. Distributions are often identical to the posterior 
obtained by MCMCglmm. mcmcsamp() is, however, in general very instable 
in my opinion. It sometimes surprises and matches the other results. I 
havent figured out under which conditions this is happening though.

I understand that few levels are problematic, but for some 
applications/questions you may be able to get some sensible answers 
(e.g. bias may be small compared to other uncertainties, etc).

Sometimes the pm bootstrap runs fail (convergence problems). I just read 
that there is going to be a slot in the release of lme4 (I guess it is 
already implemented in the development version) where warnings messages 
are stored. This will make it easier to sort out pm bootstrap runs that 
encountered problems.

Gustaf

Hi all,
I?m trying to estimate the posterior distribution of some variance
components. The end application is to compare Qst and Fst values
(quantitative genetics). A full Bayesian approach is straight
forward but in many cases you have only 4-5 populations making it
hard to estimate the among population variance, especially since
variances often are small (yes, I know, 4 is too low....). You need
a strong prior to get the mcmc to converge and produce a useful
posterior.

   My initial reaction is that if you need a strong prior to get
MCMC to converge, anything you do is going to be a little bit
dodgy -- you're in a situation where you don't really have quite
enough data to do what you want, and you're likely to end up
with problems like biased estimates of the variance, e.g.
http://rpubs.com/bbolker/4187  -- or the plug-in assumption of the
parametric bootstrap (i.e. that the estimates are approximately
the same as the true values)

To avoid the use of a strong prior I thought of using parametric
bootstrap (a recommended approach to get CI on variance components
and variance ratios). Does it make sense to use the posterior
distribution after parametric bootstrap in a similar way as you use
the posterior after MCMC sampling?  E.g. in further calculations
where you want to include the uncertainty in the variance
component. Of course, this is not the same posterior (MCMC and pm
bootstrap) but there seems to be a close connection
(Efron,http://arxiv.org/abs/1301.2936). Although this paper is over
my head so clarifications are welcome.

   In principle, yes.

I also looked at the option to use mcmcsamp() to get the posterior
distribution of a variance component.  Comparing these parametric
bootstrap and mcmcsamp gave rather different answers. See (vertical
line illustrates estimated variance):
https://dl.dropboxusercontent.com/u/20596053/Rlist/bootVSmcmcsamp.png
Parametric bootstrap gives a nice posterior around the estimated
mean while mcmcsamp behaves like in the simulation (see below) with
the peak closer to zero. Looking at the MCMC sampling, it seems like
it gets stuck at zero a few times but it doesnt appear to be a major
problem. To model is fairly simple: lmer( trait ~ 1 + (1 | pop /
pop.sire / pop.sire.dam ), data ) and the figure show the pop.sire
component, which has >30 levels. lme4 version 0.999999-2.

   A small detail -- it looks worth setting 'from=0' to constrain
the density estimate above 0.  (Histograms are good too but somewhat
harder to overlay.)  A histogram_might_  indicate that the main
difference is really in the size of the point mass at zero -- there
can be artifacts in density plots due to the smoothing ...

I know that the mcmcsamp function has been criticized and not been
  considered reliable. Is this still the case?

I don't know.  Unfortunately I don't know of a good catalogue of bad
mcmcsamp() examples; my impression was that most of them were 'sticky'
zero boundaries, which you say isn't really a problem in your case.

A simple simulation of data with few groups (n=4) and rather low
variance (2) give matching results between parametric bootstrap
(n=10000) and mcmcsamp (n=10000). See code below to reproduce the
figure found here:
https://dl.dropboxusercontent.com/u/20596053/Rlist/sim_data_bootVSmcmcsamp.png
Also with a high number of levels (n=30), the distribution peaks
much closer to zero.
https://dl.dropboxusercontent.com/u/20596053/

    Rlist/sim_data_2_bootVSmcmcsamp.png

   These are both useful; the latter is particularly interesting --
looks almost multimodal ...

ps. profiling might be a 3rd approach (?) that I never got to.

    I am encouraged by your later report (i.e. PB accurate/better
than MCMC samp ...)

Gustaf Granath (PhD)
Post doc
McMaster University