Hey, everybody:
Have you seen these papers that use "data cloning" for
hierarchical/mixed models? I'm pasting in 2 bibtex cites. The claims
are so fantastic that I can't hardly believe them. One can obtain ML
estimates and information matrix from an ensemble of MCMC estimates
derived from clones of a data set. I don't know how that is different
from averaging a lot of MCMC chains together, it sure seems like it.
I don't have an axe to grind here. I'm asking you, the smartest folks
I know ( :) ), what you think?
(I found this by accident. The rjags package turned up with a reverse
depends on the package "dclone" and I was curious to know what dclone
is for. The man pages in dclone point at the first Lele et al article
below. )
I don't know how this addresses the problem that estimates of variance
estimates of the variance components can't be normally distributed,
even asymptotically, because they have that boundary at 0. It seems
as though they assume that away, in the same way that many other
frequentists do.
I also wonder about the small-medium sized sample performance of this
kind of ML approximation versus a genuine Bayesian approach.
@article{lele_data_2007,
title = {Data cloning: easy maximum likelihood estimation for
complex ecological models using Bayesian Markov chain Monte Carlo
methods},
volume = {10},
issn = {1461-0248},
shorttitle = {Data cloning},
url = {http://www.ncbi.nlm.nih.gov/pubmed/17542934},
doi = {10.1111/j.1461-0248.2007.01047.x},
abstract = {We introduce a new statistical computing method,
called data cloning, to calculate maximum likelihood estimates and
their standard errors for complex ecological models. Although the
method uses the Bayesian framework and exploits the computational
simplicity of the Markov chain Monte Carlo {(MCMC)} algorithms, it
provides valid frequentist inferences such as the maximum likelihood
estimates and their standard errors. The inferences are completely
invariant to the choice of the prior distributions and therefore avoid
the inherent subjectivity of the Bayesian approach. The data cloning
method is easily implemented using standard {MCMC} software. Data
cloning is particularly useful for analysing ecological situations in
which hierarchical statistical models, such as state-space models and
mixed effects models, are appropriate. We illustrate the method by
fitting two nonlinear population dynamics models to data in the
presence of process and observation noise.},
number = {7},
journal = {Ecology Letters},
author = {Subhash R Lele and Brian Dennis and Frithjof Lutscher},
month = jul,
year = {2007},
note = {{PMID:} 17542934},
keywords = {Bayes Theorem, Computational Biology, Computer
Simulation, Ecology, Ecosystem, Likelihood Functions, Markov Chains,
Models, Biological, {MONTE} Carlo method, Population Dynamics},
pages = {551--563}
},
@article{ponciano_hierarchical_2009,
title = {Hierarchical models in ecology: confidence intervals,
hypothesis testing, and model selection using data cloning},
volume = {90},
issn = {0012-9658},
shorttitle = {Hierarchical models in ecology},
url = {http://www.esajournals.org/doi/abs/10.1890/08-0967.1},
doi = {10.1890/08-0967.1},
number = {2},
journal = {Ecology},
author = {Jos? Miguel Ponciano and Mark L. Taper and Brian
Dennis and Subhash R. Lele},
year = {2009},
pages = {356--362}
},
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
Hey, everybody:
Have you seen these papers that use "data cloning" for
hierarchical/mixed models?
As I understand the first paper, at least, they are "just" using MCMC to
fit ML frequentist models, wuth WinBUGS being used because it is
convenient. I have been using data cloning for MCMC GLMM a while, and it
does seem to improve the point estimates for the fixed effects regression
coefficients and variance components. I came upon it as a natural thing
to do with a poorly mixing model on a smaller example dataset, and it was
subsequently pointed out to me to be used in the machine learning
literature as well. I also decided that it was not used by WinBUGs
because their algorithms were better formulated, and didn't need this
crutch ;) -- it does slow things down.
Thanks for the references!
David Duffy.
PS If you are interested, I will post an example of its effects on random
effects variances for a GLMM.
| David Duffy (MBBS PhD) ,-_|\
| email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / *
| Epidemiology Unit, Queensland Institute of Medical Research \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v