single argument anova for GLMMs (really, glmer, or dispersion?)
On Sat, Dec 13, 2008 at 12:46 PM, Murray Jorgensen
<maj at stats.waikato.ac.nz> wrote:
I thought I might note that zero-inflated count data and negative binomial data can both be seen as cases where the response variable follows a mixture distribution. In the ZIP case a mixture of a constant [ Poisson(0) or Poisson(tiny) with another Poisson], in the negative binomial case a gamma mixture of Poissons [which might be approximated by a finite mixture].
I was thinking a bit more about your suggestion of mixtures as a way of incorporating overdispersion. It is quite a reasonable suggestion but I am afraid I don't know enough about methods of estimating the parameters in a mixture model to decide if it is feasible to put such models in the framework I plan to use. My "bottom line" is that I want to be able to determine the conditional modes of the random effects given the data and parameter values by solving a penalized iteratively reweighted least squares problem. If mixture models, or even restricted forms of mixture models like the ZIP model, can be expressed in that form then it is just a question of deciding how the model can be specified and how the specification can be translated into such a problem. (This process is not trivial. It is a lot easier to write down a model than it is to decide how to define the arguments and defaults for specifying such a model as an R function.) My guess is that models like ZIP can't be expressed that way so it would be necessary to condition on the mixture components, estimate the conditional modes of the random effects and conditional estimates of the parameters, then iterate. One of the basic changes in the allcoef branch of the lme4 code is the way that the "outer" optimization is performed (PIRLS is the "inner" optimization in the Laplace or adaptive Gauss-Hermite approximation; optimization of the profiled deviance with respect to \theta is the outer optimization). In the current lme4 this is done internally in C code and hence is somewhat inaccessible to other programmers. In the allcoef branch this is done at the R level by calling nlminb. In that branch setting doFit = FALSE in a call to lmer/glmer/nlmer returns an environment that is suitable for defining the optimization problem in that it has methods for getPars, getBounds and setPars. The latter method sets new values of the parameters and returns the objective function evaluated at the new parameters. Allowing access to this environment is intended to be the hook that others can use to set up a model that is almost what they want so they can then mold the optimization process to fit the model that they do want.