Skip to content

Examples of GLMM fits?

6 messages · Antonio.Gasparrini at lshtm.ac.uk, Murray Jorgensen, Douglas Bates +2 more

#
On Sun, Mar 7, 2010 at 2:23 PM, <Antonio.Gasparrini at lshtm.ac.uk> wrote:
I will leave it to others more skilled than I to decide how to
formulate parameter estimates for fictitious distributions.  I have
enough trouble working in the non-fiction end of statistical theory.
#
Doug's response indicates a certain scepticism about quasilikelihood and 
it's use in modelling. I am quite interested in this question and may 
even get around to attempting some theoretical work about it. What I 
would like to know about literature and discussions critical of QL and 
its role in modelling. I think I am generally aware of pro-QL literature.

Cheers,  Murray Jorgensen
On 9/03/2010 3:50 a.m., Douglas Bates wrote:
[...]
[...]
#
On Mon, Mar 8, 2010 at 2:30 PM, Murray Jorgensen <maj at waikato.ac.nz> wrote:
I should have been more specific and less cheeky.  What I am trying to
communicate is that the way that I have been able to finally work out
in my mind how to estimate parameters in generalized linear mixed
models is to go right back to the probability model and derive things
from there, step by step.  I can't do that for quasi-poisson or
quasi-binomial models because there is no probability model.

I don't know what is involved in fitting parameters using
quasilikelihood and whether or not the approach can be generalized to
mixed models.  I find it hard enough to work out what all the bits and
pieces are when working with a probability distribution.  Working with
something that sort of looks like a probability distribution but isn't
really is beyond what I want to try at this point.
#
Murray Jorgensen wrote:
I am interested too.

  I suspect most of the criticism of QL has to do with its extension
beyond the GLM framework to other areas, such as quasi-AIC, or
application of QL ideas in frameworks such as GLMMs where the
fundamental theory hasn't really been worked out (that I know of).
These methods are used a lot by people in applied fields, *without*
worrying about those missing foundations ... for example, most of the
citations for QAIC in the ecological literature go back to Lebreton et
al 1992, who say:

(p. 85): In priniciple [sic], the LRTs should be modified, as should the
AIC criterion. These matters, which need further work, show up in our
last example (Greater Flamingo) ... (p. 107) Similarly, the AIC should
be modified as DEV/c-hat+2*n*p. ***We caution that these ideas are
exploratory and not yet confirmed by fundamental statistical theory, but
the ideas are consistent with quasi-likelihood theory*** [emphasis mine]

  Others (such as Shane Richards 2008) have tested these ideas *by
simulation*, and they seem to work out OK, but I would be curious to
know about work that establishes the theoretical foundations for these
approaches.

  A wild guess would be that generalized estimating equations are the
more respectable (theoretically grounded?) approach to this kind of
problem ... on the other hand, it would be nice to have a version of GEE
where one had something better than Wald tests to rely on for smaller
data sets ...

  One final remark (which may get me in trouble): ecologists (and
others) have to deal with overdispersion in small, discrete (i.e.
plausibly exponential family) data sets with blocking factors (i.e.
random effects) all the time.  The approaches that I know of for
statistical analysis in this case are
1. GEE (only Wald tests -- see above),
2. using an alternative distribution such as the negative binomial (not
currently possible in glmer -- arguably could be hacked similarly to the
way that MASS::glm.nb() extends glm(), if there were a slot in the data
structure that allowed the internal code to make use of an additional
parameter for the variance function)
3. allowing random effects at the individual level (equivalent to
assuming a marginal lognormal-Poisson distribution) -- not currently
possible in lme4 because of tests comparing the number of random effects
to the number of observations [but can be hacked by expeRts]
4. quasi-likelihood approaches

#1 is presumably possible in gee(), geepack()
#2 is possible in ADMB, glmmADMB
#3 is possible in MCMCglmm, R2WinBUGS ...
#4 is possible in glmmPQL (but PQL has its own problems)

  So it's understandable (to me) why people are still asking about
quasi- in lme4 , given that people take it as the standard for mixed
modeling in R and that it has broad applicability ...

  Disclaimer: I think I'm not talking nonsense, but I would be happy to
be corrected.

  Ben Bolker

============================
Lebreton, J. D., K. P. Burnham, J. Clobert, and D. R. Anderson. 1992.
Modeling Survival and Testing Biological Hypotheses Using Marked
Animals: A Uni?ed Approach with Case Studies. Ecological Monographs
62:67-118. .

Richards, S. A. 2008. Dealing with overdispersed count data in applied
ecology. Journal of Applied Ecology 45:218-227. doi:
10.1111/j.1365-2664.2007.01377.x.
#
It was at one time possible to associate one random effect with each
observation.  This is an entirely respectable model.

It was allowed in lme4a last time that I used it.  It is not however 
allowed in the version of lme4 that is currently on my computer.  

The one random effect per observation model is a different model 
from any model that might be postulated to generate the quasipoisson 
variance.  On the scale of the linear predictor, a normal variance 
component is added.  The quasipoisson error increases the variance, 
on the scale of the response, by a constant multiplier.  As the source 
of the extra variance is specified precisely, the glmer one random 
effect per observation model is nicer.  Whether it gives a better account
of the data is a separate issue!

It seems to me that it would be very messy to try to incorporate a 
quasipoisson variance into the context of a GLMM model.  One would 
need, I surmise, an explicit form of model for the extra variance.

The lme4a calculation went like this:

[lme4a_0.999375-44]
[Dependencies create problems for loading DAAG into R-devel.
So I saved the image under R-2.10.0 into moths.RData, and then
loaded that image.]
library(lme4a)
moths$transect <- 1:41  # Each row is from a different transect 
moths$habitat <- relevel(moths$habitat, ref="Lowerside") 
(A.glmer <-  glmer(A~habitat+log(meters)+(1|transect),  
+                    family=poisson, data=subset(moths,subset=habitat!="Bank"))) 
Generalized linear mixed model fit by the Laplace approximation 
Formula: A ~ habitat + log(meters) + (1 | transect) 
 Data: subset(moths, subset = habitat != "Bank") 
AIC BIC logLik deviance
208 223    -95      190
Random effects:
Groups   Name        Variance Std.Dev.
transect (Intercept) 0.229    0.478   
Number of obs: 40, groups: transect, 40

Fixed effects:
               Estimate Std. Error z value Pr(>|z|)
(Intercept)        1.0876     0.3963    2.74  0.00607
habitatDisturbed  -1.2326     0.4699   -2.62  0.00872
habitatNEsoak     -0.8210     0.4402   -1.87  0.06216
habitatNWsoak      1.5166     0.3915    3.87  0.00011
habitatSEsoak      0.0515     0.3505    0.15  0.88321
habitatSWsoak      0.2435     0.4543    0.54  0.59188
habitatUpperside  -0.1669     0.5366   -0.31  0.75570
log(meters)        0.1506     0.1374    1.10  0.27293
. . . .

The variance that is due to the Poisson error is increased, on 
the scale of the linear predictor, by 0.234.  Relative to the
quasipoisson model (below), more extreme estimates of 
treatment differences (but not for Bank) are pulled in towards 
the overall mean. The habitat Disturbed now appears clearly 
different from the reference, which is Lowerside. The reason 
is that nn the scale of the linear predictor, the Poisson variance 
is largest when the linear predictor is smallest, that is when the 
expected count is, as for Disturbed, close to zero. Addition of an 
amount that is constant across the range has, by comparison
with the quasipoisson model that uses a constant multiplier
(the "dispersion"), a relatively smaller effect when the contribution 
from the Poisson variance is, on this scale, largest.

Here is the glm model that uses a quasipoisson error:
Call:
glm(formula = A ~ habitat + log(meters), data = moths)

Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-1.477e+01  -1.893e+00   6.661e-15   1.141e+00   1.518e+01  

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)
(Intercept)        3.2806     2.6352   1.245    0.222
habitatBank       -4.9768     5.0488  -0.986    0.332
habitatDisturbed  -3.0081     2.4825  -1.212    0.234
habitatNEsoak     -2.9095     2.7464  -1.059    0.297
habitatNWsoak     18.6999     3.2349   5.781 2.05e-06
habitatSEsoak      0.3414     2.4756   0.138    0.891
habitatSWsoak      1.4312     3.3565   0.426    0.673
habitatUpperside  -0.5915     3.7839  -0.156    0.877
log(meters)        0.5571     0.9212   0.605    0.550

(Dispersion parameter for gaussian family taken to be 22.50446)

    Null deviance: 1953.22  on 40  degrees of freedom
Residual deviance:  720.14  on 32  degrees of freedom
AIC: 253.85

Number of Fisher Scoring iterations: 2


I noted also that the same (?) analysis is possible under glmmPQL from MASS.  
(This relies on iterated calls to lme(), from nlme.) The estimates are 
very similar, but the SEs are noticeably different:
+                    family=poisson, data=subset(moths,subset=habitat!="Bank"))
Random effects:
Formula: ~1 | transect
      (Intercept) Residual
StdDev:       0.448     1.04

Variance function:
Structure: fixed weights
Formula: ~invwt 
Fixed effects: A ~ habitat + log(meters) 
                Value Std.Error DF t-value p-value
(Intercept)       1.110     0.437 32    2.54  0.0162
habitatDisturbed -1.240     0.528 32   -2.35  0.0253
habitatNEsoak    -0.821     0.488 32   -1.68  0.1027
habitatNWsoak     1.514     0.422 32    3.58  0.0011
habitatSEsoak     0.053     0.385 32    0.14  0.8917
habitatSWsoak     0.240     0.497 32    0.48  0.6326
habitatUpperside -0.166     0.590 32   -0.28  0.7795
log(meters)       0.147     0.151 32    0.97  0.3399


John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 09/03/2010, at 8:12 AM, Douglas Bates wrote: