glmer, p-values, and mcmcsamp
On 02/10/2011 05:17 AM, Sandra Hamel wrote:
Hello, I'm aware of the problems related to p-values in linear mixed models. I'm working on a glmer model and was a bit surprise to get p-values because I was thinking there were also problems for glmer. From recent posts in December on the the validity of p-values in glmer, the conclusion was that p-values might actually suffer even worst problems for glmer than for lmer. One first (maybe naive) question is why are p-values provided in glmer if they suffer from similar issues than p-values in lmer (which are not provided for that reason)?
Not a ridiculous or naive question. I won't speak for Doug Bates, but I think the reason is partly cultural/historical. The sources of uncertainty or approximation in the test statistics differ between the LMM and GLMM case. I am going into a lot of detail here in part because some of this is just now sinking in to my own brain, and it helps me to write it out. I would welcome corrections, especially from John Maindonald (who also commented on these issues in the thread ending here <http://article.gmane.org/gmane.comp.lang.r.lme4.devel/4894>. 1. In the LMM case, for classical balanced designs we know that the test statistic (ratio of mean squares) is not just asymptotically but finitely (? better word?) correct, given that we have the right value for the denominator degrees of freedom. For non-classical (unbalanced, crossed, correlated, weighted ...) cases we don't know that the test statistic is F distributed, but there have been attempts (Satterthwaite and Kenward-Roger [not Kenward-Rogers!] are the main ones) to correct the test distribution, or in the case of the K-R approach also the test statistic, based on information about the moments of the distribution of the test statistic under the null hypothesis. The reason for using an F statistic (rather than chi-squared, i.e. likelihood ratio test) in the first place is that we have a test statistic that is essentially of the form (deviance/scale parameter) [i.e. estimated model mean square/error mean square]; the denominator degrees of freedom describes the uncertainty in the estimate of the scale parameter. For large denominator degrees of freedom this uncertainty gets small and the F test results would converge to that of the likelihood ratio test (similarly t-tests would converge to Z-tests). (I don't know offhand how the F tests that lme/lmer implement incorporate the classical concern of "which error term to divide by" ...) 2. in GLMs (not GLMMs) where we don't estimate the scale parameter (e.g. binomial and Poisson) we are not testing deviance difference/estimated scale parameter, but just deviance difference itself (usually the assumed scale parameter is 1.0). In this case, the badness of the likelihood ratio test approximation comes because the deviance difference itself is only approximately chi-squared distributed. There are some papers by Cordeiro from the 1980s on corrections to the chi-squared assumption that at the level of a crude analogy are the same as the Satterthwaite/K-R approaches; find a correction term to adjust the reference distribution based on something we know about its moments. (I don't know if these approaches have actually been implemented in any stats package ...) There is also some stuff in McCullagh and Nelder (which I need to bite the bullet and buy) on this topic. 3. GLMMs without estimated scale parameters have the problems listed in #2, but compounded by the problem that (depending how they are calculated) the terms in the deviance may not be independent -- we don't know what the 'effective degrees of freedom' are. Someone with sufficiently good math stats chops might be able to work out an extension of the Cordeiro/Satterthwaite/K-R approaches that would apply in the GLMM case ... 4. If we were to use quasi-likelihood approaches we would then be back in a situation where we had both (all three? deviance !~ chi-square, error in the estimated scale parameter, unknown "effective denominator df") sources of error. Historically/culturally, it seems that in the GLM context people mostly ignore/don't worry about the fact that LRT is only asymptotically correct. (Faraway 2006: "For every GLM except the Gaussian, an approximate null distribution must be used whose accuracy may be in doubt particularly for smaller samples.") This is probably what leads to Z-test statistics being quoted in the GLMM case. Perhaps they shouldn't be.
In linear models, I use mcmcsamp to get CI or p-values, but I understood that this function is not yet implemented for glmer models. I was wondering then what is the best option to get either CI or p-values that would be "valid" for glmer models? I have seen an old post (2008) from Ben Bolker where he mentioned a "glmersim.R" function he had written (and a following post from Douglas Bates where he mentioned that this function could be included in lme4). Would "glmersim.R" allow me to get something similar to mcmcsamp? Are there still plans to include the possibility to allow glmer to run in mcmcsamp?
I don't know about plans for mcmcsamp with glmer. I am working to incorporate glmersim into lme4 in the reasonably near future (subject to review/approval by the other lme4 authors ...). This will give a reasonably simple (albeit computationally slow) way to get p-values for terms by doing 'parametric bootstrap' for specific model comparisons (see <http://glmm.wikidot.com> for examples: I would also hope to include some simple examples in the documentation). In principle I think it could also be used to get confidence intervals but would need to think about it some more.
Thanks for your help, Sandra
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models