I should have replied earlier to the kind comments Andrew and Doug
made earlier on my contrast between quasilikelihood 'models' and
GLMMs, however I am being invaded by sons, daughters in and out of
law and a grandson.
I tried to be neutral in this comparison but I will 'come out' here
and admit that my preference is for fully-explicit probability
models for the data. It seems to me that without this you don't have
a clear meaning for the parameters that you are estimating. John
says "I have no problem, in principle, with one fitting method that
reflects multiple possible models once one gets down to detail."
This may be psychological but I would have to say that I *do* have
a problem with that.
Because data can be generated from fully-specified models it seems
to me that there is scope for a computer experiment comparing the
results of both fitting methods to data generated from a known
model. Just because the data would be generated from a known GLMM it
need not mean that the GLMM estimates from fitting the correct model
would be superior. I imagine that quasi approaches might do well in
small samples.
Clearly the design of such a computer experiment would need to be
carefully thought out. It would probably better to design around a
particular real situation than to try for great generality.
Douglas Bates wrote:
I appreciate your thoughts on this, John, and especially the example.
There is a tendency in discussions of the theory to ignore the
practical applications and in discussion of the applications to
ignore
the theoretical foundations. Effective statistical analysis should,
of course, involve both.
I include some comments below.
On Thu, Dec 11, 2008 at 11:29 PM, John Maindonald
<john.maindonald at anu.edu.au> wrote:
The approaches have the fundamental difference that the
overdispersion model
multiplies the theoretical variance by an amount that is constant
(whether
on the scale of the response [the binomial variance becomes \phi n
p(1-p)],
or on the scale of the linear predictor).
This highlights the problem that I have with the quasi families. One
cannot choose the mean and variance separately in distributions like
the Bernoulli or Poisson. (As the binomial is a sum of independent
Bernoulli distributions I will phrase this in terms of the Bernoulli
so I don't trip up on the n's.) For a Gaussian distribution one has
the luxury of separately setting the mean and the variance. For the
Bernoulli or Poisson you don't. The choice of the mean completely
determines the distribution. There is no such thing as a Bernoulli
distribution with variance \phi p(1-p) unless \phi is identically 1.
If you want to discuss generalizations of the Bernoulli or Poisson
distributions I could accept that but simply saying that we will base
a model on a distribution that is just like the Bernoulli except that
it has a tunable variance won't fly. If we want to attribute
desirable statistical properties to estimates by saying, for example,
that they are maximum likelihood estimates or approximations to the
maximum likelihood estimates we must be able to define the likelihood
and to me that means that you must have a probability model. That
is, you must be able to describe the probability distribution of the
response or, in the case of the generalized linear mixed models, the
conditional distribution of the response given the random effects.
If I can't define the probability distribution then how can I
evaluate
the likelihood? One can argue by analogy that the estimates based on
the Bernoulli have certain properties so we think that the estimates
from a hypothetical distribution that is just like the Bernoulli
except that it is overdispersed should have similar properties with
an
extra parameter thrown in but that does not constitute a "sound
theoretical basis" and there is no justification for thinking that
such a procedure will allow for extensions to random effects.
I have called overdispersion a model - actually it is not one
model, but a
range of possible models. I have no problem, in principle, with
one fitting
method that reflects multiple possible models once one gets down
to detail.
GLMMs add to the theoretical variance, on the scale of the linear
predictor.
For binomial models with the usual link functions (logit, probit,
cloglog),
the scale spreads out close to p=0 or close to p=1, With the
glmm models
the variances then increase more, relatively to the overdispersion
model, at
the extremes of the scale. (For the Poisson with a log link,
there is
just one relevant extreme, at 0.)
Although you are not saying it explicitly it seems that you are
regarding the overdispersion and random effects approaches as a tool
for modeling the marginal variance of the responses. It happens that
I don't think of the models that way, perhaps because I have
difficulty thinking of the marginal variance. I'm enough of a
mathematician that I like to consider the simple cases which to me
means the unconditional distribution of the random effects and the
conditional distribution of the response. In the way that I think of
the models those are both relatively simple distributions.
One of the things that trip people up when considering random effects
as ways of modifying the marginal variance structure of the response
is that techniques or modifications that will work for the Gaussian
distribution don't work for other distributions, because of the fact
that the mean and variance are separately specified in the Gaussian
but not in others. In the nlme package Jose and I allowed for
correlation structures and variance functions for the response to be
specified separately from the random effects structure. This is
legitimate in that we were only allowing models for which the
conditional distribution of Y given B = b is Gaussian. Extending
this
type of modeling to other families of conditional distribution is
conceptually very difficult. Many people want to do this but, as
that
noted philosopher Mick Jagger pointed out, "You can't always get what
you want.". If you want to model binary or count data you have to
accept that the mean and the variance of the conditional distribution
cannot be modeled separately. (Another noted philosopher, Ernie of
Sesame Street, pointed out that "You gotta put down the ducky if you
want to play the saxophone.")
NB also, all variance assessments are conditional on getting the
link right.
If the link is wrong in a way that matters, there will be apparent
increases in variance in some parts of the scale that reflect
biases that
arise from the inappropriate choice of link.
There may be cases where overdispersion gives too small a variance
(relatively) at the extremes, while glmer gives too high a
variance. As
there are an infinite number of possible ways in which the
variance might
vary with (in the binomial case) p, it would be surprising if
(detectable
with enough data, or enough historical experience), there were not
such
"intermediate" cases.
There might in principle be subplot designs, with a treatment at
the subplot
level, where the overdispersion model is required at the subplot
level in
order to get the treatment comparisons correct at that level.
As much of this discussion is focused around ecology, experience
with
fitting one or other model to large datasets is surely required
that will
help decide just how, in one or other practical context, 1) the
variance is
likely to change with p (or in the Poisson case, with the Poisson
mean) and
2) what links seem preferable.
The best way to give the flexibility required for modeling the
variance, as
it seems to me, would be the ability to make the variance of p a
fairly
arbitrary function of p, with other variance components added on
the scale
of the linear predictor. More radically, all variance components
might be
functions of p. I am not sure that going that far would be a good
idea -
there'd be too many complaints that model fits will not converge!
The following shows a comparison that I did recently for a talk.
The p's
are not sufficiently extreme to show much difference between the
two models:
The dataset cbpp is from the lme4 package.
infect <- with(cbpp, cbind(incidence, size - incidence))
(gm1 <- glmer(infect ~ period + (1 | herd),
family = binomial, data = cbpp))
Random effects:
Groups Name Variance Std.Dev.
herd (Intercept) 0.412 0.642
Number of obs: 56, groups: herd, 15
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.399 0.228 -6.14 8.4e-10
period2 -0.992 0.305 -3.25 0.00116
period3 -1.129 0.326 -3.46 0.00054
period4 -1.580 0.429 -3.69 0.00023
Here, use the "sum" contrasts, and compare with the overall mean.
glmer quasibinomial
Est SE z Est SE (binomial SE) t
(Intercept) -2.32 0.22 -10.5 -2.33 0.21 (.14) -11.3
Period1 -0.66 0.32 -2.1 -0.72 0.45 (.31) -1.6
Period2 0.93 0.18 5.0 1.06 0.26 (.17) 4.2
Period3 -0.07 0.23 -0.3 -0.11 0.34 (.23) -0.3
Period4 -0.20 0.25 -0.8 -0.24 0.36 (.24) -0.7
The SEs (really SEDs) are not much increased from the
quasibinomial model.
The estimates of treatment e?ects (di?erences from the overall
mean) are
substantially reduced (pulled in towards the overall mean). The
net e?ect is
that the z -statistic is smaller for the glmer model than the t
for the
quasibinomial model.
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
On 12/12/2008, at 7:52 AM, Andrew Robinson wrote:
Echoing Murray's points here - nicely put, Murray - it seems to me
that the quasi-likelihood and the GLMM are different approaches
to the
same problem.
Can anyone provide a substantial example where random effects and
quasilikelihood have both been necessary?
Best wishes,
Andrew
On Fri, Dec 12, 2008 at 09:11:39AM +1300, Murray Jorgensen wrote:
The following is how I think about this at the moment:
The quasi-likelihood approach is an attempt at a model-free
approach to
the problem of overdispersion in non-Gaussian regression
situations
where standard distributional assumptions fail to provide the
observed
mean-variance relationship.
The glmm approach, on the other hand, does not abandon models and
likelihood but seeks to account for the observed mean-variance
relationship by adding unobserved latent variables (random
effects) to
the model.
Seeking to combine the two approaches by using both
quasilikelihood
*and* random effects would seem to be asking for trouble as
being able
to use two tools on one problem would give a lot of flexibility
to the
parameter estimation; probably leading to a very flat
quasilikelihood
surface and ill-determined optima.
But all of the above is only thoughts without the benefit of
either
serious attempts at fitting real data or doing serious theory so
I will
defer to anyone who has done either!
Philosophically, at least, there seems to be clash between the two
approaches and I doubt that attempts to combine them will be
successful.
Murray Jorgensen