Dear all, I wish to describe the relationship between the diversity of soil fungi and the application of different nutrients (fertilization). My response variable is the exponentiated Shannon index of diversity (q1). The explanatory variable has four levels. Each of the treatment factors was applied at the plot level and there are four replicates of each factor per elevation. Six randomly distributed soil cores were taken within each of the plots. For the GLMMs I used lme4 package version 1.1-15, and vegan 2.4-4 to estimate q1. One of the problems I have is that q1 takes decimal values, therefore it would be inappropriate (or impossible?) to fit my response variable with a poisson probability distribution. Therefore I tried gamma for the model specification with a log link function. I performed model selection with pairwise likelihood ratio tests. I then checked my favored model for over-dispersion (which is depicted in the output below). It seems, that the model is under dispersed! I was checking the literature for solutions to this issue, but I could only find some vague notions, namely that some level of underdispersion is tolerated. In the case of overdispersion, it is recommended to use quasilikelihood, but apparently this solution has been disabled a while ago in lme4. Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod'] Family: Gamma ( log ) Formula: q1 ~ Treatment + (1 | Elevation) + (1 | Elevation:Plot) Data: dat Control: glmerControl(optimizer = "nlminbw") AIC BIC logLik deviance df.resid 1523.6 1547.9 -754.8 1509.6 231 Scaled residuals: Min 1Q Median 3Q Max -2.0938 -0.6378 -0.0694 0.5815 3.1634 Random effects: Groups Name Variance Std.Dev. Elevation:Plot (Intercept) 0.02632 0.1622 Elevation (Intercept) 0.01366 0.1169 Residual 0.17924 0.4234 Number of obs: 238, groups: Elevation:Plot, 47; Elevation, 3 Fixed effects: Estimate Std. Error t value Pr(>|z|) (Intercept) 2.63742 0.16504 15.981 <2e-16 *** TreatmentN -0.08395 0.13284 -0.632 0.527 TreatmentNP -0.15163 0.12964 -1.170 0.242 TreatmentP -0.12925 0.12998 -0.994 0.320 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Correlation of Fixed Effects: (Intr) TrtmnN TrtmNP TreatmentN -0.412 TreatmentNP -0.418 0.522 TreatmentP -0.417 0.524 0.535 chisq ratio rdf p 38.4696552 0.1658175 232.0000000 1.0000000 My concrete questions are: Should I be concerned that my model is underdispersed? Will the coeficients of the fixed terms be reliable in this scenario? I appreciate any help on this regard. Best regards, Juan F. Due?as
Help - I have an underdispersed glmm :(
3 messages · Juan Dueñas, Ben Bolker, Juan F Dueñas
Can you say a little more about why you expect q1 to be Poisson-distributed, or more generally what mean-variance relationship you expect? Is there a mechanistic/theoretical framework for the distribution of this variable? Some reason not to find a transform that makes the responses reasonably homoscedastic and linear? In general, it *only* makes sense to compute/test dispersion for model families where the variance has a fixed relationship with the mean (binomial, Poisson, ...), not when there is an estimated scale parameter (Gaussian, Gamma, ...) Ben Bolker
On 18-01-19 08:29 AM, Juan Due?as wrote:
Dear all, I wish to describe the relationship between the diversity of soil fungi and the application of different nutrients (fertilization). My response variable is the exponentiated Shannon index of diversity (q1). The explanatory variable has four levels. Each of the treatment factors was applied at the plot level and there are four replicates of each factor per elevation. Six randomly distributed soil cores were taken within each of the plots. For the GLMMs I used lme4 package version 1.1-15, and vegan 2.4-4 to estimate q1. One of the problems I have is that q1 takes decimal values, therefore it would be inappropriate (or impossible?) to fit my response variable with a poisson probability distribution. Therefore I tried gamma for the model specification with a log link function. I performed model selection with pairwise likelihood ratio tests. I then checked my favored model for over-dispersion (which is depicted in the output below). It seems, that the model is under dispersed! I was checking the literature for solutions to this issue, but I could only find some vague notions, namely that some level of underdispersion is tolerated. In the case of overdispersion, it is recommended to use quasilikelihood, but apparently this solution has been disabled a while ago in lme4. Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod'] Family: Gamma ( log ) Formula: q1 ~ Treatment + (1 | Elevation) + (1 | Elevation:Plot) Data: dat Control: glmerControl(optimizer = "nlminbw") AIC BIC logLik deviance df.resid 1523.6 1547.9 -754.8 1509.6 231 Scaled residuals: Min 1Q Median 3Q Max -2.0938 -0.6378 -0.0694 0.5815 3.1634 Random effects: Groups Name Variance Std.Dev. Elevation:Plot (Intercept) 0.02632 0.1622 Elevation (Intercept) 0.01366 0.1169 Residual 0.17924 0.4234 Number of obs: 238, groups: Elevation:Plot, 47; Elevation, 3 Fixed effects: Estimate Std. Error t value Pr(>|z|) (Intercept) 2.63742 0.16504 15.981 <2e-16 *** TreatmentN -0.08395 0.13284 -0.632 0.527 TreatmentNP -0.15163 0.12964 -1.170 0.242 TreatmentP -0.12925 0.12998 -0.994 0.320 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Correlation of Fixed Effects: (Intr) TrtmnN TrtmNP TreatmentN -0.412 TreatmentNP -0.418 0.522 TreatmentP -0.417 0.524 0.535 chisq ratio rdf p 38.4696552 0.1658175 232.0000000 1.0000000 My concrete questions are: Should I be concerned that my model is underdispersed? Will the coeficients of the fixed terms be reliable in this scenario? I appreciate any help on this regard. Best regards, Juan F. Due?as
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
2 days later
Hello Ben and everybody, Thanks for your answer. I will try to answer to your questions in order. Maybe I expressed myself wrong. I have no good reason to expect q1 to follow poisson. All I understand - in my comic book type understanding- is that shannon's H is a measure of diversity that results from the weighting of the number of species in a sample by the individual abundance within each species. Therefore q1 is a proportion and in that case is not formally a count, so poisson will not work. As to what sort of variance structure to expect from this response variable, what I can say is that when I ignore the random effects and fit a glm with gaussian (link="log"), there is almost no heteroscedasticity, which is confirmed by a Bartlett's test (below). However, the Q-Q plot of this same model shows that the std.dev residuals do not adjust well to normality. I then re-specify the same glm with a gamma distribution and log link function and that seems to resolve both the heteroscedasticity and the fit of the std.dev residuals. Bartlett test of homogeneity of variances data: resid(glm) by meta.ec.n$Treatment Bartlett's K-squared = 0.94193, df = 3, p-value = 0.8153 Finally, I am not aware of any theoretical framework for the distribution of this variable. And the reason for not simply transforming my response variable to linearize the relationship and then fit my data with lmer is that I was under the impression that GLMM are a more efficient way to deal with these issues. The other reason is that I am worried that the transformation over corrects the residual fit to the theoretical distribution. In addition to the statements of the glm, I ran a boxcox command (from package:MASS) on the glm. It seemed to confirm that log transformation of the response variable would be the best shot. I then specify a linear mixed effect model, with the log transformed variable. However the diagnostic plots, showed some heteroscedasticity and not that optimal fit to normality. lme.1 <- lmer(log.q1~Treatment +(1|Elevation) + (1|Elevation:Plot), data=dat) Do you think it would be better then to go for the linear model transforming the data, instead of the glmm? Best regards, Juan /
Can you say a little more about why you expect q1 to be Poisson-distributed, or more generally what mean-variance relationship you expect? Is there a mechanistic/theoretical framework for the distribution of this variable? Some reason not to find a transform that makes the responses reasonably homoscedastic and linear? In general, it *only* makes sense to compute/test dispersion for model families where the variance has a fixed relationship with the mean (binomial, Poisson, ...), not when there is an estimated scale parameter (Gaussian, Gamma, ...) Ben Bolker On 18-01-19 08:29 AM, Juan Due?as wrote:
/Dear all, />//>//>/I wish to describe the relationship between the diversity of soil fungi />/and the application of different nutrients (fertilization). My response />/variable is the exponentiated Shannon index of diversity (q1). The />/explanatory variable has four levels. Each of the treatment factors was />/applied at the plot level and there are four replicates of each factor />/per elevation. Six randomly distributed soil cores were taken within />/each of the plots. />//>/For the GLMMs I used lme4 package version 1.1-15, and vegan 2.4-4 to />/estimate q1. />//>/One of the problems I have is that q1 takes decimal values, therefore it />/would be inappropriate (or impossible?) to fit my response variable with />/a poisson probability distribution. Therefore I tried gamma for the />/model specification with a log link function. I performed model />/selection with pairwise likelihood ratio tests. />//>/I then checked my favored model for over-dispersion (which is depicted />/in the output below). It seems, that the model is under dispersed! I was />/checking the literature for solutions to this issue, but I could only />/find some vague notions, namely that some level of underdispersion is />/tolerated. In the case of overdispersion, it is recommended to use />/quasilikelihood, but apparently this solution has been disabled a while />/ago in lme4. />//>/Generalized linear mixed model fit by maximum likelihood (Laplace />/Approximation) ['glmerMod'] />/Family: Gamma ( log ) />/Formula: q1 ~ Treatment + (1 | Elevation) + (1 | Elevation:Plot) />/Data: dat />/Control: glmerControl(optimizer = "nlminbw") />//>/AIC BIC logLik deviance df.resid />/1523.6 1547.9 -754.8 1509.6 231 />//>/Scaled residuals: />/Min 1Q Median 3Q Max />/-2.0938 -0.6378 -0.0694 0.5815 3.1634 />//>/Random effects: />/Groups Name Variance Std.Dev. />/Elevation:Plot (Intercept) 0.02632 0.1622 />/Elevation (Intercept) 0.01366 0.1169 />/Residual 0.17924 0.4234 />/Number of obs: 238, groups: Elevation:Plot, 47; Elevation, 3 />//>/Fixed effects: />/Estimate Std. Error t value Pr(>|z|) />/(Intercept) 2.63742 0.16504 15.981 <2e-16 *** />/TreatmentN -0.08395 0.13284 -0.632 0.527 />/TreatmentNP -0.15163 0.12964 -1.170 0.242 />/TreatmentP -0.12925 0.12998 -0.994 0.320 />/--- />/Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 />//>/Correlation of Fixed Effects: />/(Intr) TrtmnN TrtmNP />/TreatmentN -0.412 />/TreatmentNP -0.418 0.522 />/TreatmentP -0.417 0.524 0.535 />//>//>/chisq ratio rdf p />/38.4696552 0.1658175 232.0000000 1.0000000 />//>//>/My concrete questions are: Should I be concerned that my model is />/underdispersed? Will the coeficients of the fixed terms be reliable in />/this scenario? />//>//>/I appreciate any help on this regard. />//>//>/Best regards, />//>/Juan F. Due?as />//>/_______________________________________________ />/R-sig-mixed-models at r-project.org
/