data cloning. have you seen this? - R-SIG-mixed-models

Wed, Jul 21, 2010 8:34 PM #

Hey, everybody:

Have you seen these papers that use "data cloning" for
hierarchical/mixed models? I'm pasting in 2 bibtex cites.  The claims
are so fantastic that I can't hardly believe them.   One can obtain ML
estimates and information matrix from an ensemble of  MCMC estimates
derived from clones of a data set.  I don't know how that is different
from averaging a lot of MCMC chains together, it sure seems like it.

I don't have an axe to grind here.  I'm asking you, the smartest folks
I know ( :) ), what you think?

(I found this by accident. The rjags package turned up with a reverse
depends on the package "dclone" and I was curious to know what dclone
is for. The man pages in dclone point at the first Lele et al article
below. )

I don't know how this addresses the problem that estimates of variance
estimates of the variance components can't be normally distributed,
even asymptotically, because they have that boundary at 0.   It seems
as though they assume that away, in the same way that many other
frequentists do.

I also wonder about the small-medium sized sample performance of this
kind of ML approximation versus a genuine Bayesian approach.

@article{lele_data_2007,
        title = {Data cloning: easy maximum likelihood estimation for
complex ecological models using Bayesian Markov chain Monte Carlo
methods},
        volume = {10},
        issn = {1461-0248},
        shorttitle = {Data cloning},
        url = {http://www.ncbi.nlm.nih.gov/pubmed/17542934},
        doi = {10.1111/j.1461-0248.2007.01047.x},
        abstract = {We introduce a new statistical computing method,
called data cloning, to calculate maximum likelihood estimates and
their standard errors for complex ecological models. Although the
method uses the Bayesian framework and exploits the computational
simplicity of the Markov chain Monte Carlo {(MCMC)} algorithms, it
provides valid frequentist inferences such as the maximum likelihood
estimates and their standard errors. The inferences are completely
invariant to the choice of the prior distributions and therefore avoid
the inherent subjectivity of the Bayesian approach. The data cloning
method is easily implemented using standard {MCMC} software. Data
cloning is particularly useful for analysing ecological situations in
which hierarchical statistical models, such as state-space models and
mixed effects models, are appropriate. We illustrate the method by
fitting two nonlinear population dynamics models to data in the
presence of process and observation noise.},
        number = {7},
        journal = {Ecology Letters},
        author = {Subhash R Lele and Brian Dennis and Frithjof Lutscher},
        month = jul,
        year = {2007},
        note = {{PMID:} 17542934},
        keywords = {Bayes Theorem, Computational Biology, Computer
Simulation, Ecology, Ecosystem, Likelihood Functions, Markov Chains,
Models, Biological, {MONTE} Carlo method, Population Dynamics},
        pages = {551--563}
},


@article{ponciano_hierarchical_2009,
        title = {Hierarchical models in ecology: confidence intervals,
hypothesis testing, and model selection using data cloning},
        volume = {90},
        issn = {0012-9658},
        shorttitle = {Hierarchical models in ecology},
        url = {http://www.esajournals.org/doi/abs/10.1890/08-0967.1},
        doi = {10.1890/08-0967.1},
        number = {2},
        journal = {Ecology},
        author = {Jos? Miguel Ponciano and Mark L. Taper and Brian
Dennis and Subhash R. Lele},
        year = {2009},
        pages = {356--362}
},

Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas