An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20100307/d9fb8ae4/attachment.pl>
Examples of GLMM fits?
6 messages · Antonio.Gasparrini at lshtm.ac.uk, Murray Jorgensen, Douglas Bates +2 more
On Sun, Mar 7, 2010 at 2:23 PM, <Antonio.Gasparrini at lshtm.ac.uk> wrote:
Dear Douglas Bates and R users,
some time ago I sent an simulated example with a overdispersed Poisson GLMM where I compare glmmPQL and glmer. See https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q1/003289.html
I found some problems in the results on glmer with quasipoisson family. I hope someone could explain this
I will leave it to others more skilled than I to decide how to formulate parameter estimates for fictitious distributions. I have enough trouble working in the non-fiction end of statistical theory.
---------------------------------------------------------------------- Message: 1 Date: Sat, 6 Mar 2010 09:57:06 -0600 From: Douglas Bates <bates at stat.wisc.edu> To: R-mixed models mailing list <r-sig-mixed-models at r-project.org> Subject: [R-sig-ME] Examples of GLMM fits? Message-ID: <40e66e0b1003060757pa6e772fvfe29960a167e9b76 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 I have added the Gamma family for glmer in the lme4a package and will backport to the lme4 package. ? (Actually Ben Bolker sent me a patch for lme4 several weeks ago and I haven't gotten around to installing it yet but will do so.) However, before I release it I would like to have an example data set and model on which to test it. ?I don't have any. ?Could someone provide me with a reference to data and a fitted model that I could use as an example? ?I would also appreciate an example of a Poisson GLMM and other families too. ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Doug's response indicates a certain scepticism about quasilikelihood and it's use in modelling. I am quite interested in this question and may even get around to attempting some theoretical work about it. What I would like to know about literature and discussions critical of QL and its role in modelling. I think I am generally aware of pro-QL literature. Cheers, Murray Jorgensen
On 9/03/2010 3:50 a.m., Douglas Bates wrote:
[...]
I will leave it to others more skilled than I to decide how to formulate parameter estimates for fictitious distributions. I have enough trouble working in the non-fiction end of statistical theory.
[...]
Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: maj at waikato.ac.nz Fax 7 838 4155 Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 0200 8350
On Mon, Mar 8, 2010 at 2:30 PM, Murray Jorgensen <maj at waikato.ac.nz> wrote:
Doug's response indicates a certain scepticism about quasilikelihood and it's use in modelling. I am quite interested in this question and may even get around to attempting some theoretical work about it. What I would like to know about literature and discussions critical of QL and its role in modelling. I think I am generally aware of pro-QL literature.
I should have been more specific and less cheeky. What I am trying to communicate is that the way that I have been able to finally work out in my mind how to estimate parameters in generalized linear mixed models is to go right back to the probability model and derive things from there, step by step. I can't do that for quasi-poisson or quasi-binomial models because there is no probability model. I don't know what is involved in fitting parameters using quasilikelihood and whether or not the approach can be generalized to mixed models. I find it hard enough to work out what all the bits and pieces are when working with a probability distribution. Working with something that sort of looks like a probability distribution but isn't really is beyond what I want to try at this point.
Cheers, ?Murray Jorgensen On 9/03/2010 3:50 a.m., Douglas Bates wrote: [...]
I will leave it to others more skilled than I to decide how to formulate parameter estimates for fictitious distributions. ?I have enough trouble working in the non-fiction end of statistical theory.
[...] -- Dr Murray Jorgensen ? ? ?http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: maj at waikato.ac.nz ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Fax 7 838 4155 Phone ?+64 7 838 4773 wk ? ?Home +64 7 825 0441 ? Mobile 021 0200 8350
Murray Jorgensen wrote:
Doug's response indicates a certain scepticism about quasilikelihood and it's use in modelling. I am quite interested in this question and may even get around to attempting some theoretical work about it. What I would like to know about literature and discussions critical of QL and its role in modelling. I think I am generally aware of pro-QL literature. Cheers, Murray Jorgensen On 9/03/2010 3:50 a.m., Douglas Bates wrote: [...]
I will leave it to others more skilled than I to decide how to formulate parameter estimates for fictitious distributions. I have enough trouble working in the non-fiction end of statistical theory.
I am interested too. I suspect most of the criticism of QL has to do with its extension beyond the GLM framework to other areas, such as quasi-AIC, or application of QL ideas in frameworks such as GLMMs where the fundamental theory hasn't really been worked out (that I know of). These methods are used a lot by people in applied fields, *without* worrying about those missing foundations ... for example, most of the citations for QAIC in the ecological literature go back to Lebreton et al 1992, who say: (p. 85): In priniciple [sic], the LRTs should be modified, as should the AIC criterion. These matters, which need further work, show up in our last example (Greater Flamingo) ... (p. 107) Similarly, the AIC should be modified as DEV/c-hat+2*n*p. ***We caution that these ideas are exploratory and not yet confirmed by fundamental statistical theory, but the ideas are consistent with quasi-likelihood theory*** [emphasis mine] Others (such as Shane Richards 2008) have tested these ideas *by simulation*, and they seem to work out OK, but I would be curious to know about work that establishes the theoretical foundations for these approaches. A wild guess would be that generalized estimating equations are the more respectable (theoretically grounded?) approach to this kind of problem ... on the other hand, it would be nice to have a version of GEE where one had something better than Wald tests to rely on for smaller data sets ... One final remark (which may get me in trouble): ecologists (and others) have to deal with overdispersion in small, discrete (i.e. plausibly exponential family) data sets with blocking factors (i.e. random effects) all the time. The approaches that I know of for statistical analysis in this case are 1. GEE (only Wald tests -- see above), 2. using an alternative distribution such as the negative binomial (not currently possible in glmer -- arguably could be hacked similarly to the way that MASS::glm.nb() extends glm(), if there were a slot in the data structure that allowed the internal code to make use of an additional parameter for the variance function) 3. allowing random effects at the individual level (equivalent to assuming a marginal lognormal-Poisson distribution) -- not currently possible in lme4 because of tests comparing the number of random effects to the number of observations [but can be hacked by expeRts] 4. quasi-likelihood approaches #1 is presumably possible in gee(), geepack() #2 is possible in ADMB, glmmADMB #3 is possible in MCMCglmm, R2WinBUGS ... #4 is possible in glmmPQL (but PQL has its own problems) So it's understandable (to me) why people are still asking about quasi- in lme4 , given that people take it as the standard for mixed modeling in R and that it has broad applicability ... Disclaimer: I think I'm not talking nonsense, but I would be happy to be corrected. Ben Bolker ============================ Lebreton, J. D., K. P. Burnham, J. Clobert, and D. R. Anderson. 1992. Modeling Survival and Testing Biological Hypotheses Using Marked Animals: A Uni?ed Approach with Case Studies. Ecological Monographs 62:67-118. . Richards, S. A. 2008. Dealing with overdispersed count data in applied ecology. Journal of Applied Ecology 45:218-227. doi: 10.1111/j.1365-2664.2007.01377.x.
Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / people.biology.ufl.edu/bolker GPG key: people.biology.ufl.edu/bolker/benbolker-publickey.asc
It was at one time possible to associate one random effect with each
observation. This is an entirely respectable model.
It was allowed in lme4a last time that I used it. It is not however
allowed in the version of lme4 that is currently on my computer.
The one random effect per observation model is a different model
from any model that might be postulated to generate the quasipoisson
variance. On the scale of the linear predictor, a normal variance
component is added. The quasipoisson error increases the variance,
on the scale of the response, by a constant multiplier. As the source
of the extra variance is specified precisely, the glmer one random
effect per observation model is nicer. Whether it gives a better account
of the data is a separate issue!
It seems to me that it would be very messy to try to incorporate a
quasipoisson variance into the context of a GLMM model. One would
need, I surmise, an explicit form of model for the extra variance.
The lme4a calculation went like this:
[lme4a_0.999375-44]
[Dependencies create problems for loading DAAG into R-devel.
So I saved the image under R-2.10.0 into moths.RData, and then
loaded that image.]
library(lme4a)
moths$transect <- 1:41 # Each row is from a different transect
moths$habitat <- relevel(moths$habitat, ref="Lowerside")
(A.glmer <- glmer(A~habitat+log(meters)+(1|transect),
+ family=poisson, data=subset(moths,subset=habitat!="Bank")))
Generalized linear mixed model fit by the Laplace approximation
Formula: A ~ habitat + log(meters) + (1 | transect)
Data: subset(moths, subset = habitat != "Bank")
AIC BIC logLik deviance
208 223 -95 190
Random effects:
Groups Name Variance Std.Dev.
transect (Intercept) 0.229 0.478
Number of obs: 40, groups: transect, 40
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.0876 0.3963 2.74 0.00607
habitatDisturbed -1.2326 0.4699 -2.62 0.00872
habitatNEsoak -0.8210 0.4402 -1.87 0.06216
habitatNWsoak 1.5166 0.3915 3.87 0.00011
habitatSEsoak 0.0515 0.3505 0.15 0.88321
habitatSWsoak 0.2435 0.4543 0.54 0.59188
habitatUpperside -0.1669 0.5366 -0.31 0.75570
log(meters) 0.1506 0.1374 1.10 0.27293
. . . .
The variance that is due to the Poisson error is increased, on
the scale of the linear predictor, by 0.234. Relative to the
quasipoisson model (below), more extreme estimates of
treatment differences (but not for Bank) are pulled in towards
the overall mean. The habitat Disturbed now appears clearly
different from the reference, which is Lowerside. The reason
is that nn the scale of the linear predictor, the Poisson variance
is largest when the linear predictor is smallest, that is when the
expected count is, as for Disturbed, close to zero. Addition of an
amount that is constant across the range has, by comparison
with the quasipoisson model that uses a constant multiplier
(the "dispersion"), a relatively smaller effect when the contribution
from the Poisson variance is, on this scale, largest.
Here is the glm model that uses a quasipoisson error:
summary(A.glm <- glm(A ~ habitat + log(meters), data=moths))
Call:
glm(formula = A ~ habitat + log(meters), data = moths)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.477e+01 -1.893e+00 6.661e-15 1.141e+00 1.518e+01
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.2806 2.6352 1.245 0.222
habitatBank -4.9768 5.0488 -0.986 0.332
habitatDisturbed -3.0081 2.4825 -1.212 0.234
habitatNEsoak -2.9095 2.7464 -1.059 0.297
habitatNWsoak 18.6999 3.2349 5.781 2.05e-06
habitatSEsoak 0.3414 2.4756 0.138 0.891
habitatSWsoak 1.4312 3.3565 0.426 0.673
habitatUpperside -0.5915 3.7839 -0.156 0.877
log(meters) 0.5571 0.9212 0.605 0.550
(Dispersion parameter for gaussian family taken to be 22.50446)
Null deviance: 1953.22 on 40 degrees of freedom
Residual deviance: 720.14 on 32 degrees of freedom
AIC: 253.85
Number of Fisher Scoring iterations: 2
I noted also that the same (?) analysis is possible under glmmPQL from MASS.
(This relies on iterated calls to lme(), from nlme.) The estimates are
very similar, but the SEs are noticeably different:
A.glmmPQL <- glmmPQL(A~habitat+log(meters), random=~1|transect,
+ family=poisson, data=subset(moths,subset=habitat!="Bank"))
Random effects:
Formula: ~1 | transect
(Intercept) Residual
StdDev: 0.448 1.04
Variance function:
Structure: fixed weights
Formula: ~invwt
Fixed effects: A ~ habitat + log(meters)
Value Std.Error DF t-value p-value
(Intercept) 1.110 0.437 32 2.54 0.0162
habitatDisturbed -1.240 0.528 32 -2.35 0.0253
habitatNEsoak -0.821 0.488 32 -1.68 0.1027
habitatNWsoak 1.514 0.422 32 3.58 0.0011
habitatSEsoak 0.053 0.385 32 0.14 0.8917
habitatSWsoak 0.240 0.497 32 0.48 0.6326
habitatUpperside -0.166 0.590 32 -0.28 0.7795
log(meters) 0.147 0.151 32 0.97 0.3399
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 09/03/2010, at 8:12 AM, Douglas Bates wrote:
On Mon, Mar 8, 2010 at 2:30 PM, Murray Jorgensen <maj at waikato.ac.nz> wrote:
Doug's response indicates a certain scepticism about quasilikelihood and it's use in modelling. I am quite interested in this question and may even get around to attempting some theoretical work about it. What I would like to know about literature and discussions critical of QL and its role in modelling. I think I am generally aware of pro-QL literature.
I should have been more specific and less cheeky. What I am trying to communicate is that the way that I have been able to finally work out in my mind how to estimate parameters in generalized linear mixed models is to go right back to the probability model and derive things from there, step by step. I can't do that for quasi-poisson or quasi-binomial models because there is no probability model. I don't know what is involved in fitting parameters using quasilikelihood and whether or not the approach can be generalized to mixed models. I find it hard enough to work out what all the bits and pieces are when working with a probability distribution. Working with something that sort of looks like a probability distribution but isn't really is beyond what I want to try at this point.
Cheers, Murray Jorgensen On 9/03/2010 3:50 a.m., Douglas Bates wrote: [...]
I will leave it to others more skilled than I to decide how to formulate parameter estimates for fictitious distributions. I have enough trouble working in the non-fiction end of statistical theory.
[...] -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: maj at waikato.ac.nz Fax 7 838 4155 Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 0200 8350
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models