Skip to content

Help - I have an underdispersed glmm :(

3 messages · Juan Dueñas, Ben Bolker, Juan F Dueñas

#
Dear all,


I wish to describe the relationship between the diversity of soil fungi 
and the application of different nutrients (fertilization). My response 
variable is the exponentiated Shannon index of diversity (q1). The 
explanatory variable has four levels. Each of the treatment factors was 
applied at the plot level and there are four replicates of each factor 
per elevation. Six randomly distributed soil cores were taken within 
each of the plots.

For the GLMMs I used lme4 package version 1.1-15, and vegan 2.4-4 to 
estimate q1.

One of the problems I have is that q1 takes decimal values, therefore it 
would be inappropriate (or impossible?) to fit my response variable with 
a poisson probability distribution. Therefore I tried gamma for the 
model specification with a log link function. I performed model 
selection with pairwise likelihood ratio tests.

I then checked my favored model for over-dispersion (which is depicted 
in the output below). It seems, that the model is under dispersed! I was 
checking the literature for solutions to this issue, but I could only 
find some vague notions, namely that some level of underdispersion is 
tolerated. In the case of overdispersion, it is recommended to use 
quasilikelihood, but apparently this solution has been disabled a while 
ago in lme4.

Generalized linear mixed model fit by maximum likelihood (Laplace 
Approximation) ['glmerMod']
Family: Gamma ( log )
Formula: q1 ~ Treatment + (1 | Elevation) + (1 | Elevation:Plot)
Data: dat
Control: glmerControl(optimizer = "nlminbw")

AIC BIC logLik deviance df.resid
1523.6 1547.9 -754.8 1509.6 231

Scaled residuals:
Min 1Q Median 3Q Max
-2.0938 -0.6378 -0.0694 0.5815 3.1634

Random effects:
Groups Name Variance Std.Dev.
Elevation:Plot (Intercept) 0.02632 0.1622
Elevation (Intercept) 0.01366 0.1169
Residual 0.17924 0.4234
Number of obs: 238, groups: Elevation:Plot, 47; Elevation, 3

Fixed effects:
Estimate Std. Error t value Pr(>|z|)
(Intercept) 2.63742 0.16504 15.981 <2e-16 ***
TreatmentN -0.08395 0.13284 -0.632 0.527
TreatmentNP -0.15163 0.12964 -1.170 0.242
TreatmentP -0.12925 0.12998 -0.994 0.320
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
(Intr) TrtmnN TrtmNP
TreatmentN -0.412
TreatmentNP -0.418 0.522
TreatmentP -0.417 0.524 0.535


chisq ratio rdf p
38.4696552 0.1658175 232.0000000 1.0000000


My concrete questions are: Should I be concerned that my model is 
underdispersed? Will the coeficients of the fixed terms be reliable in 
this scenario?


I appreciate any help on this regard.


Best regards,

Juan F. Due?as
#
Can you say a little more about why you expect q1 to be
Poisson-distributed, or more generally what mean-variance relationship
you expect?  Is there a mechanistic/theoretical framework for the
distribution of this variable?  Some reason not to find a transform that
makes the responses reasonably homoscedastic and linear?

  In general, it *only* makes sense to compute/test dispersion for model
families where the variance has a fixed relationship with the mean
(binomial, Poisson, ...), not when there is an estimated scale parameter
(Gaussian, Gamma, ...)

  Ben Bolker
On 18-01-19 08:29 AM, Juan Due?as wrote:
2 days later
#
Hello Ben and everybody,

Thanks for your answer. I will try to answer to your questions in order.

Maybe I expressed myself wrong. I have no good reason to expect q1 to
follow poisson. All I understand - in my comic book type understanding-
is that shannon's H is a measure of diversity that results from the
weighting of the number of species in a sample by the individual
abundance within each species. Therefore q1 is a proportion and in that
case is not formally a count, so poisson will not work.

As to what sort of variance structure to expect from this response
variable, what I can say is that when I ignore the random effects and
fit a glm with gaussian (link="log"), there is almost no
heteroscedasticity, which is confirmed by a Bartlett's test
(below). However, the Q-Q plot of this same model shows that the std.dev
residuals do not adjust well to normality. I then re-specify the same
glm with a gamma distribution and log link function and that seems to
resolve both the heteroscedasticity and the fit of the std.dev
residuals.

Bartlett test of homogeneity of variances
data: resid(glm) by meta.ec.n$Treatment
Bartlett's K-squared = 0.94193, df = 3, p-value = 0.8153

Finally, I am not aware of any theoretical framework for the
distribution of this variable. And the reason for not simply
transforming my response variable to linearize the relationship and then
fit my data with lmer is that I was under the impression that GLMM are a
more efficient way to deal with these issues. The other reason is that I
am worried that the transformation over corrects the residual fit to
the theoretical distribution.

In addition to the statements of the glm, I ran a boxcox command (from
package:MASS) on the glm. It seemed to confirm that log transformation
of the response variable would be the best shot. I then specify a linear
mixed effect model, with the log transformed variable. However the
diagnostic plots, showed some heteroscedasticity and not that optimal fit to normality.

lme.1 <- lmer(log.q1~Treatment +(1|Elevation) + (1|Elevation:Plot), data=dat)

Do you think it would be better then to go for the linear model transforming the data, instead of the glmm?

Best regards,

Juan

/
/