Dear everyone,
I have a question regarding the comparison of Generalized Linear Mixed
Models (GLMM - using the lme4 - package) and Markov-Chain Monte-Carlo
Generalized Mixed Models (MCMCglmm - using the MCMCglmm-package).
I am running mixed models using both packages. While the estimates
provided by each approach are relatively comparable, the significance
levels differ strongly. My main questions are the following:
1. How can the divergence in results be explained and why doe both
methods provide different results even though the model structure is the
same? (Note: diagnostic plots for both models look fine)
2. I am aware that there is strong debate about the validity of p-values
within GLMMs. Is there similar concern regarding MCMCglmm?
3. What would be a sensible interpretation of both model outputs
especially with respect to PRED2 and the interaction term.
Below I provide a short description and example output:
I want to test the effect of 3 continuous predictors (PRED2, PRED3,
PRED4) on a proportional response (RESP). PRED2 and PRED3 enter the
model as an interaction effect (PRED2*PRED3). The response is modeled
using a Poisson error structure with a log-link function and the
nominator of the ratio is therefore included as an offset predictor
(offset(log(PRED1))).
MODEL1 - GLMM using the lme4-package:
MOD1 <- glmer(RESP ~ offset(log(PRED1)) + PRED2 * PRED3 + PRED4 +
(1|RANEF), family = "poisson", data = data,
control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=100000)))
summary(MOD1)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
Family: poisson ( log )
Formula: RESP ~ offset(log(PRED1)) + PRED2 * PRED3 + PRED4 + (1 | RANEF)
Data: data
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
AIC BIC logLik deviance df.resid
2451.3 2470.9 -1219.6 2439.3 190
Scaled residuals:
Min 1Q Median 3Q Max
-10.8175 -1.2477 -0.1913 1.1076 11.0389
Random effects:
Groups Name Variance Std.Dev.
RANEF (Intercept) 0.1519 0.3898
Number of obs: 196, groups: Order, 56
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.29896 0.06250 -52.784 < 2e-16 ***
PRED2 0.09089 0.01850 4.912 9.00e-07 ***
PRED3 0.57262 0.02375 24.114 < 2e-16 ***
PRED4 0.54726 0.01808 30.275 < 2e-16 ***
PRED2:PRED3 0.12190 0.01821 6.693 2.19e-11 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Correlation of Fixed Effects:
(Intr) PRED2 PRED3 PRED4
PRED2 -0.142
PRED3 -0.056 0.102
PRED4 -0.042 0.135 -0.155
PRED2:PRED3 0.001 0.017 -0.678 0.273
MODEL2 - MCMCglmm using the lme4-package:
Now I build the same model using a MCMCglmm using weak informative
priors. The offset term was implemented following the suggestion by
Jarrod Hadfield in this post
(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2017q3/026016.html).
See the model formula and summary below:
prior <- list(B=list(V=diag(6)*10, mu=c(0,1,0,0,0,0)),
G =list(G1=list(V=1,nu=0.002)),
R=list(V=1,nu=0.002))
prior$B$V[2,2] <- 1e-9 # set fixed effect prior for the offset variable to 1
PGLMM.MOD1 <- MCMCglmm(RESP ~ log(PRED1) + PRED2 * PRED3 + PRED4,
random = ~ RANEF,
family="poisson",
prior = prior,
scale = T,
pr=TRUE,
data = data,
nitt = 500000, burnin = 20000,
thin = 200)
summary(PGLMM.MOD1)
Iterations = 20001:499801
Thinning interval = 200
Sample size = 2400
DIC: 1164.591
G-structure: ~Order
post.mean l-95% CI u-95% CI eff.samp
RANEF 0.02264 0.0002387 0.0762 2400
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.2788 0.1946 0.3705 2400
Location effects: RESP ~ log(PRED1) + PRED2 * PRED3 + PRED4
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -3.40280 -3.50958 -3.28826 2400 <4e-04 ***
log(PRED1) 1.00000 0.99994 1.00006 2257 <4e-04 ***
PRED2 0.08369 -0.01995 0.18120 2400 0.113
PRED3 0.63058 0.52164 0.73889 2400 <4e-04 ***
PRED4 0.60488 0.50565 0.69963 2400 <4e-04 ***
PRED2:PRED3 0.06882 -0.02363 0.17414 2258 0.167
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
As you can see from the different summaries, the estimates are fairly
similar with some divergence especially regarding PRED2 and the
interaction term. Additionally the provided p-values differ
dramatically, with the GLMM (lme4) being highly significant for PRED2
and the interaction wheras non-significance for the MCMCglmm.
My questions now are:
1. How can the divergence in results be explained and why doe both
methods provide different results even though the model structure is the
same? (Note: diagnostic plots for both models look fine)
2. I am aware that there is strong debate about the validity of p-values
within GLMMs. Is there similar concern regarding MCMCglmm?
3. What would be a sensible interpretation of both model outputs
especially with respect to PRED2 and the interaction term.
Any insight and help on that matter would be highly appreciated.
Best regards,
Bernd
Bernd Lenzner PhD
Researcher
University of Vienna
Department of Botany and Biodiversity Research
Division of Conservation Biology, Vegetation and Landscape Ecology
Rennweg 14
A-1030 Vienna