Skip to content

single argument anova for GLMMs not yet implemented

20 messages · Murray Jorgensen, Gavin Simpson, Andrew J Tyre +6 more

#
R.S. Cotter wrote:
Not stupid.  (The only stupid questions are not-doing-your-homework
ones.)

  What information are you hoping to glean from anova(mod) ?
If it is p-values for individual predictors, or information
about "residual degrees of freedom", you're probably out of
luck: see the oft-repeated questions on this list and the FAQ entry
for why that's hard.

  (Feel free to write back to clarify what you have in mind --
but be warned that the answer is probably something along the
lines of "lme4 doesn't work that way, you're still thinking
in the classical sums-of-squares paradigm" ...)

  Ben Bolker
#
On Wed, Dec 10, 2008 at 9:56 AM, Ben Bolker <bolker at ufl.edu> wrote:
I certainly agree with Ben that new users, or any users for that
matter, should feel free to ask questions about what does and doesn't
seem to work in the lme4 package.  The many kind users of the package
have been generous in allowing me to experiment in the code, sometimes
breaking features that were formerly working, while I try to come to
an understanding of mixed-effects models and computational methods for
them.  The process has worked in that I feel that I understand them
much better than I did in the past.  However, doing things the way I
do - creating and maintaining a software package that will allow for
fitting general versions of the model while I am still experimenting
with the overall design - is an intensive and, regrettably, slow way
of doing research.  My thanks to those who have had the tolerance to
take this journey with me.

The particular issue of not providing sequential anova summary for a
generalized linear mixed model is related to the "quasi" families of
conditional distributions.  Families like "binomial" or "poisson" or
"Gamma" or the default "gaussian" family (I find the capitalization of
those names to be interesting - the two proper nouns, Poisson and
Gaussian, are not capitalized and the common noun. gamma, is)
represent a probability distribution from which a likelihood can be
calculated.  The "quasi" families do not correspond to probability
distributions so they produce a quasi-likelihood which is used in the
GLM fitting.  I know how to add random effects to the linear predictor
for a model with a likelihood.  I'm not sure how it should be done for
the quasi families.  One can mimic the computations, but without a
sound theoretical basis, it is possible that the results could be
nonsense and I, at least, wouldn't know whether they were nonsense.

I may end up punting on the quasi families and simply provide some
parameter estimates without estimates of precision or, perhaps more
radically, not allow the quasi families to be used.  If you look at
the families provided in R you will see that the misleadingly named
"AIC" function in the family (it actually returns the deviance) is
only defined for binomial, Poisson, Gaussian and gamma families.
Those AIC functions evaluate probability densities or probability mass
functions and I can work with that.  I'm afraid I don't know enough
about the quasi families to make sense of them yet.
#
The following is how I think about this at the moment:

The quasi-likelihood approach is an attempt at a model-free approach to 
the problem of overdispersion in non-Gaussian regression situations 
where standard distributional assumptions fail to provide the observed 
mean-variance relationship.

The glmm approach, on the other hand, does not abandon models and 
likelihood but seeks to account for the observed mean-variance 
relationship by adding unobserved latent variables (random effects) to 
the model.

Seeking to combine the two approaches by using both quasilikelihood 
*and* random effects would seem to be asking for trouble as being able 
to use two tools on one problem would give a lot of flexibility to the 
parameter estimation; probably leading to a very flat quasilikelihood 
surface and ill-determined optima.

But all of the above is only thoughts without the benefit of either 
serious attempts at fitting real data or doing serious theory so I will 
defer to anyone who has done either!

Philosophically, at least, there seems to be clash between the two 
approaches and I doubt that attempts to combine them will be successful.

Murray Jorgensen
Douglas Bates wrote:

  
    
#
Echoing Murray's points here - nicely put, Murray - it seems to me
that the quasi-likelihood and the GLMM are different approaches to the
same problem.  

Can anyone provide a substantial example where random effects and
quasilikelihood have both been necessary?  

Best wishes,

Andrew
On Fri, Dec 12, 2008 at 09:11:39AM +1300, Murray Jorgensen wrote:

  
    
#
On Thu, Dec 11, 2008 at 2:52 PM, Andrew Robinson
<A.Robinson at ms.unimelb.edu.au> wrote:
I agree and I also appreciate Murray's elegant explanation.
I'm kind of waiting for Ben Bolker to let us know how things look from
his perspective.  I seem to remember that Ben and others in ecological
fields were concerned about overdispersion, even after incorporating
random effects.
#
On Thu, 2008-12-11 at 14:58 -0600, Douglas Bates wrote:
Not wanting to preempt Ben or anything, but yes, we ecologists are very
concerned about overdispersion. However, and I say this as someone new
to this field (mixed models, not ecology), the quasilikelihood approach
seems far more of a fudge to avoid having to think about where the
overdispersion is coming from. I find the negative binomial far more
intuitive to deal with than working around the problems not having a
proper likelihood brings (inference, model selection, information
stats). Often, the course of overdispersion is of direct interest
itself.

In the GLM arena I find hurdle and ZIP and ZINB models far more
interpretable in ecological terms than fudging the problem with
quasilikelihood methods. And after-all, that is what I am interested in;
models I can interpret in ecological terms.

My two-penneth,

G
#
I also like the explanation of quasi-likelihood vs. glmm, but I can say 
from an ecological perspective I frequently encounter situations in which 
I have included all the random effects of blocks, plots, times etc, and 
still have massive amounts of overdispersion. A student in my Ecological 
Statistics class examined repeated counts of grasshoppers in plots that 
have or have not received nitrogen addition. A poisson family glmm gives a 
nice account of the effects of total veg biomass, date, and nitrogen 
addition, but the residual deviance  is  > 1700 for a sample size of about 
400. I would love to be able to fit a negative binomial model in that 
case; I typically resort to using WinBUGS and MCMC to do this, but that is 
beyond what I can get my students to do in a one semester course. 

I have encountered situations in which even using a negative binomial 
model (for counts) or beta-binomial type model ( for proportion of success 
data) are insufficient to explain the variability in ecological 
situations. In these cases I usually have reason to believe that there is 
a discrete mixture going on - ie the observations are coming from two or 
more distinct populations which have not been distinguished by anything 
the observer can record, or thought to record (immune status for parasite 
hosts, for example). I have tried quasi- family models in those cases, but 
always felt a little uncomfortable drawing much in the way of inference. I 
understand likelihood! 

Anyway, I appreciate the tool. It is very nice and continues to get 
better! Thanks,

Drew Tyre

School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974

phone: +1 402 472 4054 
fax: +1 402 472 2946
email: atyre2 at unl.edu
http://snr.unl.edu/tyre



"Douglas Bates" <bates at stat.wisc.edu> 
Sent by: r-sig-mixed-models-bounces at r-project.org
12/11/2008 03:00 PM

To
"Andrew Robinson" <A.Robinson at ms.unimelb.edu.au>
cc
R Mixed Models <r-sig-mixed-models at r-project.org>, Murray Jorgensen 
<maj at stats.waikato.ac.nz>
Subject
Re: [R-sig-ME] single argument anova for GLMMs not yet implemented






On Thu, Dec 11, 2008 at 2:52 PM, Andrew Robinson
<A.Robinson at ms.unimelb.edu.au> wrote:
I agree and I also appreciate Murray's elegant explanation.
I'm kind of waiting for Ben Bolker to let us know how things look from
his perspective.  I seem to remember that Ben and others in ecological
fields were concerned about overdispersion, even after incorporating
random effects.
successful.
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
#
Hi Drew,
On Thu, Dec 11, 2008 at 03:52:06PM -0600, Andrew J Tyre wrote:
This looks like a promising example (so to speak) ... have you tried
fitting a poisson family glmm and a negative binomial hierarchical
model to these data in WinBUGS?  if so, how do the models compare
within that framework?
I'd suggest that if you have reason to believe that you have an
underlying discrete mixture but no way to tease out the identity, then
any modelling should be treated with great caution!  Or maybe EM would
help?  Treat the population identity as a latent variable?  Simon
Blomberg told me about a really nice simple example in which he did
that kind of thing, over a beer a few months ago.  It might or might
not be relevant here.  

Cheers

Andrew
#
On Fri, 12 Dec 2008, Andrew Robinson wrote:

            
There has been a certain amount of work on models where the 
distribution of the random effects is unspecified -- modelled 
as a mixture which is estimated by nonparametric ML (a paper by Murray 
Aitken in Biometrics is the earliest I know of).  These could probably 
slurp up overdispersion of the type described.  I imagine they are only 
tractable for simple models for the random effects (eg group 
membership/clusters).
#
The approaches have the fundamental difference that the overdispersion  
model multiplies the theoretical variance by an amount that is  
constant (whether on the scale of the response [the binomial variance  
becomes \phi n p(1-p)], or on the scale of the linear predictor).

I have called overdispersion a model - actually it is not one model,  
but a range of possible models. I have no problem, in principle, with  
one fitting method that reflects multiple possible models once one  
gets down to detail.

GLMMs add to the theoretical variance, on the scale of the linear  
predictor. For binomial models with the usual link functions (logit,  
probit, cloglog), the scale spreads out close to p=0 or close to  
p=1,   With the glmm models the variances then increase more,  
relatively to the overdispersion model, at the  extremes of the  
scale.   (For the Poisson with a log link, there is just one relevant  
extreme, at 0.)

NB also, all variance assessments are conditional on getting the link  
right.  If the link is wrong in a way that matters, there will be  
apparent increases in variance in some parts of the scale that reflect  
biases that arise from the inappropriate choice of link.

There may be cases where overdispersion gives too small a variance  
(relatively) at the extremes, while glmer gives too high a variance.   
As there are an infinite number of possible ways in which the variance  
might vary with (in the binomial case) p, it would be surprising if  
(detectable with enough data, or enough historical experience), there  
were not such "intermediate" cases.

There might in principle be subplot designs, with a treatment at the  
subplot level, where the overdispersion model is required at the  
subplot level in order to get the treatment comparisons correct at  
that level.

As much of this discussion is focused around ecology, experience with  
fitting one or other model to large datasets is surely required that  
will help decide just how, in one or other practical context, 1) the  
variance is likely to change with p (or in the Poisson case, with the  
Poisson mean) and 2) what links seem preferable.

The best way to give the flexibility required for modeling the  
variance, as it seems to me, would be the ability to make the variance  
of p a fairly arbitrary function of p, with other variance components  
added on the scale of the linear predictor.  More radically, all  
variance components might be functions of p.  I am not sure that going  
that far would be a good idea - there'd be too many complaints that  
model fits will not converge!

The following shows a comparison that I did recently for a talk.  The  
p's are not sufficiently extreme to show much difference between the  
two models:

The dataset cbpp is from the lme4 package.
infect <- with(cbpp, cbind(incidence, size - incidence))
(gm1 <- glmer(infect ~ period + (1 | herd),
family = binomial, data = cbpp))
Random effects:
Groups Name Variance Std.Dev.
herd (Intercept) 0.412 0.642
Number of obs: 56, groups: herd, 15
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.399 0.228 -6.14 8.4e-10
period2 -0.992 0.305 -3.25 0.00116
period3 -1.129 0.326 -3.46 0.00054
period4 -1.580 0.429 -3.69 0.00023

Here, use the ?sum? contrasts, and compare with the overall mean.
                             glmer                quasibinomial
                     Est  SE          z     Est SE (binomial SE)    t
(Intercept) -2.32 0.22 -10.5   -2.33 0.21 (.14)             -11.3
Period1     -0.66 0.32   -2.1   -0.72 0.45 (.31)               -1.6
Period2      0.93 0.18    5.0     1.06 0.26 (.17)                4.2
Period3     -0.07 0.23  -0.3    -0.11 0.34 (.23)               -0.3
Period4     -0.20 0.25  -0.8    -0.24 0.36 (.24)               -0.7

The SEs (really SEDs) are not much increased from the quasibinomial  
model. The estimates of treatment e?ects (di?erences from the  
overall mean) are substantially reduced (pulled in towards the overall  
mean). The net e?ect is that the z -statistic is smaller for the  
glmer model than the t for the quasibinomial model.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
On 12/12/2008, at 7:52 AM, Andrew Robinson wrote:

            
#
I appreciate your thoughts on this, John, and especially the example.
There is a tendency in discussions of the theory to ignore the
practical applications and in discussion of the applications to ignore
the theoretical foundations.  Effective statistical analysis should,
of course, involve both.

I include some comments below.

On Thu, Dec 11, 2008 at 11:29 PM, John Maindonald
<john.maindonald at anu.edu.au> wrote:
This highlights the problem that I have with the quasi families.  One
cannot choose the mean and variance separately in distributions like
the Bernoulli or Poisson.  (As the binomial is a sum of independent
Bernoulli distributions I will phrase this in terms of the Bernoulli
so I don't trip up on the n's.)  For a Gaussian distribution one has
the luxury of separately setting the mean and the variance.  For the
Bernoulli or Poisson you don't.  The choice of the mean completely
determines the distribution.  There is no such thing as a Bernoulli
distribution with variance \phi p(1-p) unless \phi is identically 1.

If you want to discuss generalizations of the Bernoulli or Poisson
distributions I could accept that but simply saying that we will base
a model on a distribution that is just like the Bernoulli except that
it has a tunable variance won't fly.  If we want to attribute
desirable statistical properties to estimates by saying, for example,
that they are maximum likelihood estimates or approximations to the
maximum likelihood estimates we must be able to define the likelihood
and to me that means that you must have a probability model.   That
is, you must be able to describe the probability distribution of the
response or, in the case of the generalized linear mixed models, the
conditional distribution of the response given the random effects.

If I can't define the probability distribution then how can I evaluate
the likelihood?  One can argue by analogy that the estimates based on
the Bernoulli have certain properties so we think that the estimates
from a hypothetical distribution that is just like the Bernoulli
except that it is overdispersed should have similar properties with an
extra parameter thrown in but that does not constitute a "sound
theoretical basis" and there is no justification for thinking that
such a procedure will allow for extensions to random effects.
Although you are not saying it explicitly it seems that you are
regarding the overdispersion and random effects approaches as a tool
for modeling the marginal variance of the responses.  It happens that
I don't think of the models that way, perhaps because I have
difficulty thinking of the marginal variance.  I'm enough of a
mathematician that I like to consider the simple cases which to me
means the unconditional distribution of the random effects and the
conditional distribution of the response.  In the way that I think of
the models those are both relatively simple distributions.

One of the things that trip people up when considering random effects
as ways of modifying the marginal variance structure of the response
is that techniques or modifications that will work for the Gaussian
distribution don't work for other distributions, because of the fact
that the mean and variance are separately specified in the Gaussian
but not in others.  In the nlme package Jose and I allowed for
correlation structures and variance functions for the response to be
specified separately from the random effects structure.  This is
legitimate in that we were only allowing models for which the
conditional distribution of Y given B = b is Gaussian.  Extending this
type of modeling to other families of conditional distribution is
conceptually very difficult.  Many people want to do this but, as that
noted philosopher Mick Jagger pointed out, "You can't always get what
you want.".  If you want to model binary or count data you have to
accept that the mean and the variance of the conditional distribution
cannot be modeled separately.  (Another noted philosopher, Ernie of
Sesame Street, pointed out that "You gotta put down the ducky if you
want to play the saxophone.")
#
I should have replied earlier to the kind comments Andrew and Doug made 
earlier on my contrast between quasilikelihood 'models' and GLMMs, 
however I am being invaded by sons, daughters in and out of law and a 
grandson.

I tried to be neutral in this comparison but I will 'come out' here and 
admit that my preference is for fully-explicit probability models for 
the data. It seems to me that without this you don't have a clear 
meaning for the parameters that you are estimating. John says "I have no 
problem, in principle, with one fitting method that reflects multiple 
possible models once one gets down to detail." This may be psychological 
  but I would have to say that I *do* have a problem with that.

Because data can be generated from fully-specified models it seems to me 
that there is scope for a computer experiment comparing the results of 
both fitting methods to data generated from a known model. Just because 
the data would be generated from a known GLMM it need not mean that the 
GLMM estimates from fitting the correct model would be superior. I 
imagine that quasi approaches might do well in small samples.

Clearly the design of such a computer experiment would need to be 
carefully thought out. It would probably better to design around a 
particular real situation than to try for great generality.
Douglas Bates wrote:

  
    
#
I certainly prefer fully specified models.  I will end up pretty much  
conceding Doug's point, but maybe not quite!

Equally, I am uneasy with glmer's restriction to models where the  
error family variance can only be modified by addition on the scale of  
the linear predictor.

There's an issue of what is a "fully specified model".  Sure one can  
sample from a negative binomial, but when one gets down to what  
processes might generate a negative binomial, there are several  
alternative mechanisms.  Rather than using a negative binomial as a  
way of modeling overdispersion, I'd prefer to be modeling the process  
at a more fundamental level -  maybe from applying a gamma mixing  
distribution to a Poisson rate.  That way, one can think about whether  
a gamma mixing distribution makes sense, whether something else might  
be more appropriate.  Direct entry into use of a negative binomial  
avoids exposure to such doubts.

I like to think of a quasibinomial with a dispersion equal to three as  
generated by a sequence of Bernoulli trials in which each event  
generates 3 repeats of itself.  This does not quite work (or I do not  
know how it works) if each event has to generate 2.5 repeats.

[I am reminded of Swedish mathematical educationalist Andrejs Dunkel's  
quip: "1.8 children per family! How can that be?" , with the reply  
"Statistics, my friends, statistics Seems to promote split  
personalities") Sadly, Andrejs is deceased]

It would I think be in the spirit of the direction in which lmer is  
moving, albeit for heterogeneous variances where the error is normal,  
to allow modeling at the level (a gamma mixture of Poissons) I have in  
mind.  Would this be enormously more computationally intensive than  
modeling the negative binomial or other such distributions?  I note,  
though, your comments Doug that suggest that this kind of thing is  
scarcely tractable.

I do think a lot more work is needed that looks at what seems to  
happen, in one or other type of application, with multiple real  
datasets from that area.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
On 13/12/2008, at 6:51 PM, Murray Jorgensen wrote:

            
#
On Sat, Dec 13, 2008 at 08:08:11PM +1100, John Maindonald wrote:
Yes, me too.
I think that this is fair enough and well put, John, but I'm going to
push back in the other direction with a hypothetical example.  Let's
say that you have your over-dispersed count data.  What do you lose if
you simply take some convenient and credible transformation of the
response variable and then use lme, paying close attention to your
conditional distribution plots?  

Let me pin that down a little: would you be reluctant to follow that
approach under any conditions?  If so, then under what conditions
would you be reluctant to follow that approach, and why?

(My experience has been that the judicious application of variance
models corrects for most of the variability problems that I have
encountered so far in practical work.)
2 half the time and 3 half the time?
On a side note, I believe that Lee and Nelder's approach to
mixed-effects modelling (as realized in Genstat) allows random effects
to come from certain non-normal distributions, including for example
the binomial-beta and the poisson-gamma combinations.  See Table 6.2
of their book.  The underlying theory still has some gaps in it, I
think. 

Greatly enjoying this conversation,

Andrew
#
Besides the aesthetic preference for fully specified models etc.
(although there's also the danger of forgetting that "all models
are wrong etc." and believing the model too much), the most common
reason in ecological contexts for not being able to get away with
transformation is that the data are zero-rich (someone mentioned
zero-inflated/hurdle models earlier in this discussion, which
basically amounts to modeling presence/absence [either of
"structural" zeros or of all zero values] and conditional
density separately).  There's nothing you can do to transform
a spike in the data (at zero or elsewhere) into anything
other than a spike ...

  Ben Bolker
#
I thought I might note that zero-inflated count data and negative 
binomial data can both be seen as cases where the response variable 
follows a mixture distribution. In the ZIP case a mixture of a constant 
[ Poisson(0) or Poisson(tiny) with another Poisson], in the negative 
binomial case a gamma mixture of Poissons [which might be approximated 
by a finite mixture].

John is "uneasy with glmer's restriction to models where the error 
family variance can only be modified by addition on the scale of the 
linear predictor." Mixtures would be one mechanism for introducing other 
variance patterns into the model.

Murray Jorgensen
Ben Bolker wrote:

  
    
#
On Sat, Dec 13, 2008 at 12:46 PM, Murray Jorgensen
<maj at stats.waikato.ac.nz> wrote:
In another thread I posted a long-winded manifesto regarding the types
of models that will be fit by lme4-1.0

John, you may be uneasy but I need to get a completed version of this
code out some time (it is way overdue), so I am truncating the list of
possible models at the ones I describe there.

I appreciate all the contributions to this discussion.
#
On Sat, Dec 13, 2008 at 12:46 PM, Murray Jorgensen
<maj at stats.waikato.ac.nz> wrote:
I was thinking a bit more about your suggestion of mixtures as a way
of incorporating overdispersion.  It is quite a reasonable suggestion
but I am afraid I don't know enough about methods of estimating the
parameters in a mixture model to decide if it is feasible to put such
models in the framework I plan to use.  My "bottom line" is that I
want to be able to determine the conditional modes of the random
effects given the data and parameter values by solving a penalized
iteratively reweighted least squares problem. If mixture models, or
even restricted forms of mixture models like the ZIP model, can be
expressed in that form then it is just a question of deciding how the
model can be specified and how the specification can be translated
into such a problem.  (This process is not trivial.  It is a lot
easier to write down a model than it is to decide how to define the
arguments and defaults for specifying such a model as an R function.)

My guess is that models like ZIP can't be expressed that way so it
would be necessary to condition on the mixture components, estimate
the conditional modes of the random effects and conditional estimates
of the parameters, then iterate.

One of the basic changes in the allcoef branch of the lme4 code is the
way that the "outer" optimization is performed (PIRLS is the "inner"
optimization in the Laplace or adaptive Gauss-Hermite approximation;
optimization of the profiled deviance with respect to \theta is the
outer optimization).  In the current lme4 this is done internally in C
code and hence is somewhat inaccessible to other programmers.  In the
allcoef branch this is done at the R level by calling nlminb.  In that
branch setting doFit = FALSE in a call to lmer/glmer/nlmer returns an
environment that is suitable for defining the optimization problem in
that it has methods for getPars, getBounds and setPars.  The latter
method sets new values of the parameters and returns the objective
function evaluated at the new parameters. Allowing access to this
environment is intended to be the hook that others can use to set up a
model that is almost what they want so they can then mold the
optimization process to fit the model that they do want.
#
On Sat, Dec 13, 2008 at 12:00:07PM -0500, Ben Bolker wrote:
Yes indeed :), you are correct.  But, that's a problem that bedevils
glms too.  I was more thinking along the lines: when would it be
inadvisable to use a transformation and explicit variance model
instead of a glm?

Cheers,

Andrew
#
Doesn't it make sense to think of the "gamma mixture of Poissons" as
a Poisson GLM with a gamma-distributed random (intercept) effect?  In other
words, it's a GLMM with a gamma distribution instead of gaussian.  Adding
another (gaussian) random effect to a negative binomial model seems a bit
redundant.

Regards,   Rob Kushler
Murray Jorgensen wrote: