Hi all,
I?m trying to estimate the posterior distribution of some variance
components. The end application is to compare Qst and Fst values
(quantitative genetics). A full Bayesian approach is straight
forward but in many cases you have only 4-5 populations making it
hard to estimate the among population variance, especially since
variances often are small (yes, I know, 4 is too low....). You need
a strong prior to get the mcmc to converge and produce a useful
posterior.
My initial reaction is that if you need a strong prior to get
MCMC to converge, anything you do is going to be a little bit
dodgy -- you're in a situation where you don't really have quite
enough data to do what you want, and you're likely to end up
with problems like biased estimates of the variance, e.g.
http://rpubs.com/bbolker/4187 -- or the plug-in assumption of the
parametric bootstrap (i.e. that the estimates are approximately
the same as the true values)
To avoid the use of a strong prior I thought of using parametric
bootstrap (a recommended approach to get CI on variance components
and variance ratios). Does it make sense to use the posterior
distribution after parametric bootstrap in a similar way as you use
the posterior after MCMC sampling? E.g. in further calculations
where you want to include the uncertainty in the variance
component. Of course, this is not the same posterior (MCMC and pm
bootstrap) but there seems to be a close connection
(Efron,http://arxiv.org/abs/1301.2936). Although this paper is over
my head so clarifications are welcome.
I also looked at the option to use mcmcsamp() to get the posterior
distribution of a variance component. Comparing these parametric
bootstrap and mcmcsamp gave rather different answers. See (vertical
line illustrates estimated variance):
https://dl.dropboxusercontent.com/u/20596053/Rlist/bootVSmcmcsamp.png
Parametric bootstrap gives a nice posterior around the estimated
mean while mcmcsamp behaves like in the simulation (see below) with
the peak closer to zero. Looking at the MCMC sampling, it seems like
it gets stuck at zero a few times but it doesnt appear to be a major
problem. To model is fairly simple: lmer( trait ~ 1 + (1 | pop /
pop.sire / pop.sire.dam ), data ) and the figure show the pop.sire
component, which has >30 levels. lme4 version 0.999999-2.
A small detail -- it looks worth setting 'from=0' to constrain
the density estimate above 0. (Histograms are good too but somewhat
harder to overlay.) A histogram_might_ indicate that the main
difference is really in the size of the point mass at zero -- there
can be artifacts in density plots due to the smoothing ...
I know that the mcmcsamp function has been criticized and not been
considered reliable. Is this still the case?
I don't know. Unfortunately I don't know of a good catalogue of bad
mcmcsamp() examples; my impression was that most of them were 'sticky'
zero boundaries, which you say isn't really a problem in your case.
Rlist/sim_data_2_bootVSmcmcsamp.png
These are both useful; the latter is particularly interesting --
looks almost multimodal ...
ps. profiling might be a 3rd approach (?) that I never got to.
I am encouraged by your later report (i.e. PB accurate/better
than MCMC samp ...)