Dear All, I am new to R (coming from GenStat) and to this discussion list. In postings in the third quarter of 2008 there were some queries about GLMMs for overdispersed data and problems with LME4 (see e.g. https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q3/001430.html and https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q3/001404.html). In the second posting Douglas Bates writes that "If someone can suggest what the formula for the scale factor should be, I ...". I am having trouble understanding the GLMM algorithm in lme4, but most estimation methods, among which Breslow and Clayton (1993), use a first order Laplace approximation of the likelihood. The iterative algorithm employed, is based on a linear mixed model for an adjusted dependent variable which is updated in each iteration step. The linear mixed model is fitted with REML resulting in estimates for the fixed effects and predictions for the random effects. Then a new adjusted dependent variable (using the fitted values and the predictions) is calculated and the procedure is repeated. This algorithm involves additional weights that are associated with an added residual variance component (fixed at value 1). The over-dispersed (or quasi-likelihood) version of this estimation method comprises an unknown residual variance, i.e. not fixed at value 1, which is estimated by REML. This unknown residual variance component is the scale or overdispersion factor. In the GLMM routine in GenStat the difference in the specification of the random model is (e.g. for a simple block design, "residual" is a factor with a separate level for each observation and "value" is the estimate from the previous iteration) without overdispersion: random=block+residual ; initialvalue=value,1 ; constraint=none,fixed with overdispersion: random=block+residual ; initialvalue=value,value ; constraint=none,none Note that in an ordinary GLM SEs are multiplied by the square root of the overdispersion factor, and that the estimates of the fixed effects themselves do not change. This is because the overdispersion factor is estimated from Pearson's chi-squared or the deviance after the GLM model has been fitted. However in a GLMM this might work differently as the overdispersion factor can and should be estimated in every iteration possibly leading to different estimates for both variance components and fixed effects. When there is considerable overdispersion there will in general be a difference between the estimated variance components for the model without and the model with overdispersion. This is because in the model without overdispersion the extra variation at the residual stratum will partly be absorbed by the random effects in the linear predictor. As a final remark note that in the algorithm above there is already a residual random effect in the estimation method, so there is no room for a residual random effect in the linear predictor. This can also be viewed in the following way: the algorithm employs predictions for the random effects. A residual random effect does not have replications and therefore predictions for such a random effect are notoriously unstable. Note that this a property (or one might say a defect) of the estimation method. That is why the documentation of the GLMM routine in GenStat states that "The random model must exclude the bottom stratum" Maximum likelihood does not have this restriction, but ML requires that all random effects are integrated out and this is generally not feasible. Best regards, Paul Goedhart
Overdispersion and quasi distributions
6 messages · Goedhart, Paul, Renwick, A. R., Martin Henry H. Stevens +2 more
4 days later
Dear All There has been a lot of talk recently on this forum regarding (over)dispersion and quasi models. I am running a GLMM with a poisson family for some tick burden data I have and I wanted to check if I had overdispersion in my model (and thus a poisson family would be inappropriate). The only method I have found to do this is to run the model with a quasipoisson family and then ask for the scale parameter using: lme4:::sigma(model) However, when I do this my model appears severely UNDER dispersed: sigmaML 3.779694e-06 Without the random effect in the model (i.e a GLM) the scale parameter is 1.07 - almost perfect for a poisson family. Is the method I am trying not appropriate to determine the dispersion in the mixed model? Does anyone know a better method? Many thanks, Anna The University of Aberdeen is a charity registered in Scotland, No SC013683.
Hi Anna, So you tried a GLMM with quasipoisson and a GLM with Poisson? How about a GLMM with Poisson? Sounds like you may have a random effect that is necessary for your hypothesis test, but which does not explain any variation (but I really have no way of knowing). Hank
On Oct 21, 2008, at 5:33 AM, Renwick, A. R. wrote:
Dear All There has been a lot of talk recently on this forum regarding (over) dispersion and quasi models. I am running a GLMM with a poisson family for some tick burden data I have and I wanted to check if I had overdispersion in my model (and thus a poisson family would be inappropriate). The only method I have found to do this is to run the model with a quasipoisson family and then ask for the scale parameter using: lme4:::sigma(model) However, when I do this my model appears severely UNDER dispersed: sigmaML 3.779694e-06 Without the random effect in the model (i.e a GLM) the scale parameter is 1.07 - almost perfect for a poisson family. Is the method I am trying not appropriate to determine the dispersion in the mixed model? Does anyone know a better method? Many thanks, Anna The University of Aberdeen is a charity registered in Scotland, No SC013683.
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dr. Hank Stevens, Associate Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/~stevenmh/ http://www.users.muohio.edu/harkesae/ http://www.cas.muohio.edu/ecology http://www.muohio.edu/botany/ "E Pluribus Unum" "I love deadlines. I love the whooshing noise they make as they go by." (Douglas Adams) If you send an attachment, please try to send it in a format anyone can read, such as PDF, text, Open Document Format, HTML, or RTF. Why? See: http://www.gnu.org/philosophy/no-word-attachments.html
I did run a GLMM with poisson - that is the model type I want to use. I only used a GLMM with quasipoisson to check the scale parameter as I am unaware as to how to check if you have over/under dispersion in the poisson model, and hence violating the assumption of the model, and other way.
# glmm with poisson family
mix<-lmer(trianlarvae~Sex+width+sess+Nhat+Sex:width+Sex:sess+Sex:Nhat+width:sess+width:Nhat+sess:Nhat+(1|LocTran), family=poisson, data=larv, REML=FALSE)
summary(mix)
Generalized linear mixed model fit by the Laplace approximation
Formula: trianlarvae ~ Sex + width + sess + Nhat + Sex:width + Sex:sess + Sex:Nhat + width:sess + width:Nhat + sess:Nhat + (1 | LocTran)
Data: larv
AIC BIC logLik deviance
464 572.7 -212 424
Random effects:
Groups Name Variance Std.Dev.
LocTran (Intercept) 1.3462 1.1603
Number of obs: 1697, groups: LocTran, 14
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.218e+00 1.708e+00 -2.4694 0.0135 *
Sexmale 6.999e-01 1.189e+00 0.5887 0.5561
width -1.426e-01 2.360e-01 -0.6044 0.5456
sessAugust 1.486e+00 2.060e+00 0.7212 0.4708
sessJune -1.545e+01 1.212e+03 -0.0127 0.9898
sessOctober 3.119e+00 1.838e+00 1.6973 0.0896 .
Nhat -4.909e-02 5.814e-02 -0.8442 0.3985
Sexmale:width 1.159e-01 7.612e-02 1.5222 0.1280
Sexmale:sessAugust -7.540e-01 1.632e+00 -0.4621 0.6440
Sexmale:sessJune 1.310e+01 1.212e+03 0.0108 0.9914
Sexmale:sessOctober -1.118e+00 1.223e+00 -0.9139 0.3608
Sexmale:Nhat 9.881e-03 1.012e-02 0.9765 0.3288
width:sessAugust 8.245e-01 5.882e-01 1.4017 0.1610
width:sessJune -4.034e-02 2.791e-01 -0.1445 0.8851
width:sessOctober -1.045e-02 2.057e-01 -0.0508 0.9595
width:Nhat 4.239e-03 3.654e-03 1.1600 0.2460
sessAugust:Nhat -1.484e-01 1.299e-01 -1.1422 0.2534
sessJune:Nhat 2.646e-02 6.249e-02 0.4235 0.6719
sessOctober:Nhat 1.462e-03 5.776e-02 0.0253 0.9798
-----Original Message-----
From: Martin Henry H. Stevens [mailto:HStevens at muohio.edu]
Sent: 21 October 2008 11:19
To: Renwick, A. R.
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Poisson mixed models
Hi Anna,
So you tried a GLMM with quasipoisson and a GLM with Poisson? How about a GLMM with Poisson? Sounds like you may have a random effect that is necessary for your hypothesis test, but which does not explain any variation (but I really have no way of knowing).
Hank
On Oct 21, 2008, at 5:33 AM, Renwick, A. R. wrote:
Dear All There has been a lot of talk recently on this forum regarding (over) dispersion and quasi models. I am running a GLMM with a poisson family for some tick burden data I have and I wanted to check if I had overdispersion in my model (and thus a poisson family would be inappropriate). The only method I have found to do this is to run the model with a quasipoisson family and then ask for the scale parameter using: lme4:::sigma(model) However, when I do this my model appears severely UNDER dispersed: sigmaML 3.779694e-06 Without the random effect in the model (i.e a GLM) the scale parameter is 1.07 - almost perfect for a poisson family. Is the method I am trying not appropriate to determine the dispersion in the mixed model? Does anyone know a better method? Many thanks, Anna The University of Aberdeen is a charity registered in Scotland, No SC013683.
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dr. Hank Stevens, Associate Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/~stevenmh/ http://www.users.muohio.edu/harkesae/ http://www.cas.muohio.edu/ecology http://www.muohio.edu/botany/ "E Pluribus Unum" "I love deadlines. I love the whooshing noise they make as they go by." (Douglas Adams) If you send an attachment, please try to send it in a format anyone can read, such as PDF, text, Open Document Format, HTML, or RTF. Why? See: http://www.gnu.org/philosophy/no-word-attachments.html The University of Aberdeen is a charity registered in Scotland, No SC013683.
On Tue, Oct 21, 2008 at 5:24 AM, Renwick, A. R. <a.renwick at abdn.ac.uk> wrote:
I did run a GLMM with poisson - that is the model type I want to use. I only used a GLMM with quasipoisson to check the scale parameter as I am unaware as to how to check if you have over/under dispersion in the poisson model, and hence violating the assumption of the model, and other way.
# glmm with poisson family
mix<-lmer(trianlarvae~Sex+width+sess+Nhat+Sex:width+Sex:sess+Sex:Nhat+width:sess+width:Nhat+sess:Nhat+(1|LocTran), family=poisson, data=larv, REML=FALSE)
summary(mix)
Generalized linear mixed model fit by the Laplace approximation
Formula: trianlarvae ~ Sex + width + sess + Nhat + Sex:width + Sex:sess + Sex:Nhat + width:sess + width:Nhat + sess:Nhat + (1 | LocTran)
Data: larv
AIC BIC logLik deviance
464 572.7 -212 424
Random effects:
Groups Name Variance Std.Dev.
LocTran (Intercept) 1.3462 1.1603
Number of obs: 1697, groups: LocTran, 14
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.218e+00 1.708e+00 -2.4694 0.0135 *
Sexmale 6.999e-01 1.189e+00 0.5887 0.5561
width -1.426e-01 2.360e-01 -0.6044 0.5456
sessAugust 1.486e+00 2.060e+00 0.7212 0.4708
sessJune -1.545e+01 1.212e+03 -0.0127 0.9898
sessOctober 3.119e+00 1.838e+00 1.6973 0.0896 .
Nhat -4.909e-02 5.814e-02 -0.8442 0.3985
Sexmale:width 1.159e-01 7.612e-02 1.5222 0.1280
Sexmale:sessAugust -7.540e-01 1.632e+00 -0.4621 0.6440
Sexmale:sessJune 1.310e+01 1.212e+03 0.0108 0.9914
Sexmale:sessOctober -1.118e+00 1.223e+00 -0.9139 0.3608
Sexmale:Nhat 9.881e-03 1.012e-02 0.9765 0.3288
width:sessAugust 8.245e-01 5.882e-01 1.4017 0.1610
width:sessJune -4.034e-02 2.791e-01 -0.1445 0.8851
width:sessOctober -1.045e-02 2.057e-01 -0.0508 0.9595
width:Nhat 4.239e-03 3.654e-03 1.1600 0.2460
sessAugust:Nhat -1.484e-01 1.299e-01 -1.1422 0.2534
sessJune:Nhat 2.646e-02 6.249e-02 0.4235 0.6719
sessOctober:Nhat 1.462e-03 5.776e-02 0.0253 0.9798
-----Original Message-----
From: Martin Henry H. Stevens [mailto:HStevens at muohio.edu]
Sent: 21 October 2008 11:19
To: Renwick, A. R.
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Poisson mixed models
Hi Anna,
So you tried a GLMM with quasipoisson and a GLM with Poisson? How about a GLMM with Poisson? Sounds like you may have a random effect that is necessary for your hypothesis test, but which does not explain any variation (but I really have no way of knowing).
Hank
On Oct 21, 2008, at 5:33 AM, Renwick, A. R. wrote:
Dear All There has been a lot of talk recently on this forum regarding (over) dispersion and quasi models. I am running a GLMM with a poisson family for some tick burden data I have and I wanted to check if I had overdispersion in my model (and thus a poisson family would be inappropriate). The only method I have found to do this is to run the model with a quasipoisson family and then ask for the scale parameter using: lme4:::sigma(model) However, when I do this my model appears severely UNDER dispersed: sigmaML 3.779694e-06 Without the random effect in the model (i.e a GLM) the scale parameter is 1.07 - almost perfect for a poisson family. Is the method I am trying not appropriate to determine the dispersion in the mixed model? Does anyone know a better method? Many thanks, Anna
That seems to be an unusually low value for the dispersion. I would have to check the code to see exactly what the sigma function returns in the case of the quasipoisson family. It is quite possible that it is an inappropriate value. I think it is more straightforward to look at the penalized, weighted residual sum of squares, model at deviance["pwrss"], divided by the number of observations, model at dims["n"]. You can do that for a model fit with the Poisson family. It also looks as if you will want to reduce the number of fixed-effects terms in the model.
On 21/10/2008, at 11:37 PM, Douglas Bates wrote:
On Tue, Oct 21, 2008 at 5:24 AM, Renwick, A. R. <a.renwick at abdn.ac.uk> wrote:
Dear All There has been a lot of talk recently on this forum regarding (over) dispersion and quasi models. I am running a GLMM with a poisson family for some tick burden data I have and I wanted to check if I had overdispersion in my model (and thus a poisson family would be inappropriate). The only method I have found to do this is to run the model with a quasipoisson family and then ask for the scale parameter using: lme4:::sigma(model) However, when I do this my model appears severely UNDER dispersed: sigmaML 3.779694e-06 Without the random effect in the model (i.e a GLM) the scale parameter is 1.07 - almost perfect for a poisson family. Is the method I am trying not appropriate to determine the dispersion in the mixed model? Does anyone know a better method? Many thanks, Anna
That seems to be an unusually low value for the dispersion. I would have to check the code to see exactly what the sigma function returns in the case of the quasipoisson family. It is quite possible that it is an inappropriate value. I think it is more straightforward to look at the penalized, weighted residual sum of squares, model at deviance["pwrss"], divided by the number of observations, model at dims["n"]. You can do that for a model fit with the Poisson family. It also looks as if you will want to reduce the number of fixed-effects terms in the model.
I posted this before but it seems to have been missed.
A quick simulation shows that something is wrong. What I did was to
generate random effects Poisson data and fit both as Poisson and quasi
Poisson, and should give similar results except with slightly larger
SE for quasi. Note the change in the subject random effect std dev.
This seems to be close to the square of the mean of the Poisson data.
Standard errors for fixed effects are totally wrong.
Ken
> nsubj <- 100
> npersubj <- 20
>
> subject <- factor(rep(1:nsubj,each=npersubj))
>
> means <- exp(rep(10+rnorm(nsubj),each=npersubj))
>
> y <- rpois(nsubj*npersubj,means)
>
> simdata <- data.frame(y,subject)
>
> lmer1 <- lmer(y~(1|subject),data=simdata,family=poisson)
> summary(lmer1)
Generalized linear mixed model fit by the Laplace approximation
Formula: y ~ (1 | subject)
Data: simdata
AIC BIC logLik deviance
3305 3316 -1650 3301
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 0.9891 0.99453
Number of obs: 2000, groups: subject, 100
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 10.12714 0.09945 101.8 <2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> lmer2 <- lmer(y~(1|subject),data=simdata,family=quasipoisson)
> summary(lmer2)
Generalized linear mixed model fit by the Laplace approximation
Formula: y ~ (1 | subject)
Data: simdata
AIC BIC logLik deviance
3307 3324 -1650 3301
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 15137 123.03
Residual 15304 123.71
Number of obs: 2000, groups: subject, 100
Fixed effects:
Estimate Std. Error t value
(Intercept) 10.13 12.30 0.8231