Hello, I'm aware of the problems related to p-values in linear mixed models. I'm working on a glmer model and was a bit surprise to get p-values because I was thinking there were also problems for glmer. From recent posts in December on the the validity of p-values in glmer, the conclusion was that p-values might actually suffer even worst problems for glmer than for lmer. One first (maybe naive) question is why are p-values provided in glmer if they suffer from similar issues than p-values in lmer (which are not provided for that reason)? In linear models, I use mcmcsamp to get CI or p-values, but I understood that this function is not yet implemented for glmer models. I was wondering then what is the best option to get either CI or p-values that would be "valid" for glmer models? I have seen an old post (2008) from Ben Bolker where he mentioned a "glmersim.R" function he had written (and a following post from Douglas Bates where he mentioned that this function could be included in lme4). Would "glmersim.R" allow me to get something similar to mcmcsamp? Are there still plans to include the possibility to allow glmer to run in mcmcsamp? Thanks for your help, Sandra
glmer, p-values, and mcmcsamp
4 messages · Sandra Hamel, Ben Bolker, Andrew Robinson +1 more
On 02/10/2011 05:17 AM, Sandra Hamel wrote:
Hello, I'm aware of the problems related to p-values in linear mixed models. I'm working on a glmer model and was a bit surprise to get p-values because I was thinking there were also problems for glmer. From recent posts in December on the the validity of p-values in glmer, the conclusion was that p-values might actually suffer even worst problems for glmer than for lmer. One first (maybe naive) question is why are p-values provided in glmer if they suffer from similar issues than p-values in lmer (which are not provided for that reason)?
Not a ridiculous or naive question. I won't speak for Doug Bates, but I think the reason is partly cultural/historical. The sources of uncertainty or approximation in the test statistics differ between the LMM and GLMM case. I am going into a lot of detail here in part because some of this is just now sinking in to my own brain, and it helps me to write it out. I would welcome corrections, especially from John Maindonald (who also commented on these issues in the thread ending here <http://article.gmane.org/gmane.comp.lang.r.lme4.devel/4894>. 1. In the LMM case, for classical balanced designs we know that the test statistic (ratio of mean squares) is not just asymptotically but finitely (? better word?) correct, given that we have the right value for the denominator degrees of freedom. For non-classical (unbalanced, crossed, correlated, weighted ...) cases we don't know that the test statistic is F distributed, but there have been attempts (Satterthwaite and Kenward-Roger [not Kenward-Rogers!] are the main ones) to correct the test distribution, or in the case of the K-R approach also the test statistic, based on information about the moments of the distribution of the test statistic under the null hypothesis. The reason for using an F statistic (rather than chi-squared, i.e. likelihood ratio test) in the first place is that we have a test statistic that is essentially of the form (deviance/scale parameter) [i.e. estimated model mean square/error mean square]; the denominator degrees of freedom describes the uncertainty in the estimate of the scale parameter. For large denominator degrees of freedom this uncertainty gets small and the F test results would converge to that of the likelihood ratio test (similarly t-tests would converge to Z-tests). (I don't know offhand how the F tests that lme/lmer implement incorporate the classical concern of "which error term to divide by" ...) 2. in GLMs (not GLMMs) where we don't estimate the scale parameter (e.g. binomial and Poisson) we are not testing deviance difference/estimated scale parameter, but just deviance difference itself (usually the assumed scale parameter is 1.0). In this case, the badness of the likelihood ratio test approximation comes because the deviance difference itself is only approximately chi-squared distributed. There are some papers by Cordeiro from the 1980s on corrections to the chi-squared assumption that at the level of a crude analogy are the same as the Satterthwaite/K-R approaches; find a correction term to adjust the reference distribution based on something we know about its moments. (I don't know if these approaches have actually been implemented in any stats package ...) There is also some stuff in McCullagh and Nelder (which I need to bite the bullet and buy) on this topic. 3. GLMMs without estimated scale parameters have the problems listed in #2, but compounded by the problem that (depending how they are calculated) the terms in the deviance may not be independent -- we don't know what the 'effective degrees of freedom' are. Someone with sufficiently good math stats chops might be able to work out an extension of the Cordeiro/Satterthwaite/K-R approaches that would apply in the GLMM case ... 4. If we were to use quasi-likelihood approaches we would then be back in a situation where we had both (all three? deviance !~ chi-square, error in the estimated scale parameter, unknown "effective denominator df") sources of error. Historically/culturally, it seems that in the GLM context people mostly ignore/don't worry about the fact that LRT is only asymptotically correct. (Faraway 2006: "For every GLM except the Gaussian, an approximate null distribution must be used whose accuracy may be in doubt particularly for smaller samples.") This is probably what leads to Z-test statistics being quoted in the GLMM case. Perhaps they shouldn't be.
In linear models, I use mcmcsamp to get CI or p-values, but I understood that this function is not yet implemented for glmer models. I was wondering then what is the best option to get either CI or p-values that would be "valid" for glmer models? I have seen an old post (2008) from Ben Bolker where he mentioned a "glmersim.R" function he had written (and a following post from Douglas Bates where he mentioned that this function could be included in lme4). Would "glmersim.R" allow me to get something similar to mcmcsamp? Are there still plans to include the possibility to allow glmer to run in mcmcsamp?
I don't know about plans for mcmcsamp with glmer. I am working to incorporate glmersim into lme4 in the reasonably near future (subject to review/approval by the other lme4 authors ...). This will give a reasonably simple (albeit computationally slow) way to get p-values for terms by doing 'parametric bootstrap' for specific model comparisons (see <http://glmm.wikidot.com> for examples: I would also hope to include some simple examples in the documentation). In principle I think it could also be used to get confidence intervals but would need to think about it some more.
Thanks for your help, Sandra
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20110211/283b0ccb/attachment.pl>
I have interspersed comments below. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
On 11/02/2011, at 4:57 AM, Ben Bolker wrote:
On 02/10/2011 05:17 AM, Sandra Hamel wrote:
Hello, I'm aware of the problems related to p-values in linear mixed models. I'm working on a glmer model and was a bit surprise to get p-values because I was thinking there were also problems for glmer. From recent posts in December on the the validity of p-values in glmer, the conclusion was that p-values might actually suffer even worst problems for glmer than for lmer. One first (maybe naive) question is why are p-values provided in glmer if they suffer from similar issues than p-values in lmer (which are not provided for that reason)?
Not a ridiculous or naive question. I won't speak for Doug Bates, but I think the reason is partly cultural/historical.
I think it pretty much entirely cultural/historical. The only difference is that in a very limited range of special cases, the denominator is known (at all events this is what the analyst tells glmer!), i.e., the binomial or poisson or assumptions that lead to this variance can be trusted. But even if the binomial or poisson assumptions can be trusted, the relevant denominator variance is commonly the theoretical variance, plus something that arises form the mm part of the model.
The sources of uncertainty or approximation in the test statistics differ between the LMM and GLMM case.
I am going into a lot of detail here in part because some of this is just now sinking in to my own brain, and it helps me to write it out. I would welcome corrections, especially from John Maindonald (who also commented on these issues in the thread ending here <http://article.gmane.org/gmane.comp.lang.r.lme4.devel/4894>.
1. In the LMM case, for classical balanced designs we know that the test statistic (ratio of mean squares) is not just asymptotically but finitely (? better word?) correct, given that we have the right value for the denominator degrees of freedom. For non-classical (unbalanced, crossed, correlated, weighted ...) cases we don't know that the test statistic is F distributed, but there have been attempts (Satterthwaite and Kenward-Roger [not Kenward-Rogers!] are the main ones) to correct the test distribution, or in the case of the K-R approach also the test statistic, based on information about the moments of the distribution of the test statistic under the null hypothesis. The reason for using an F statistic (rather than chi-squared, i.e. likelihood ratio test) in the first place is that we have a test statistic that is essentially of the form (deviance/scale parameter) [i.e. estimated model mean square/error mean square]; the denominator degrees of freedom describes the uncertainty in the estimate of the scale parameter. For large denominator degrees of freedom this uncertainty gets small and the F test results would converge to that of the likelihood ratio test (similarly t-tests would converge to Z-tests). (I don't know offhand how the F tests that lme/lmer implement incorporate the classical concern of "which error term to divide by" ...)
Even in balanced cases where the denominator is a combination of two components of variance, there is an issue. Consider fro example the kiwishade data in the DAAG package that has a block/plot/subplot 3/4/4 structure. The appropriate mean square for comparing treatments is, according to the conventional wisdom, the plot mean square. The relevant mean square is then 20.93 on 6 degrees of freedom. If one could argue that the plot component was zero, that the apparent plot component is just a result of chance, the error mean square reduces to 13.96 on 36 df. The conventional wisdom makes sense in this case. But suppose that the estimate of the plot component of variance is so small that the two mean squares are quite similar. The conventional wisdom has 6 df for the variance estimate, even though it is a linear combination of two variances, with one estimated on 36df that accounts for most of the mean square. So the use of 6 df is conservative. Problem is, some kind of Bayesian prior is required in order to achieve a compromise between the 6df and the 36df. Some will want to test whether the plot component is "significantly" > 0. If there is any reason at all to suspect that the plot component, although small, ought to be greater than 0, this is anti-conservative. In a field trial plot/subplot context, there is every reason to expect a plot component that is >0, unless plots have been chosen inappropriately.
2. in GLMs (not GLMMs) where we don't estimate the scale parameter (e.g. binomial and Poisson) we are not testing deviance difference/estimated scale parameter, but just deviance difference itself (usually the assumed scale parameter is 1.0). In this case, the badness of the likelihood ratio test approximation comes because the deviance difference itself is only approximately chi-squared distributed. There are some papers by Cordeiro from the 1980s on corrections to the chi-squared assumption that at the level of a crude analogy are the same as the Satterthwaite/K-R approaches; find a correction term to adjust the reference distribution based on something we know about its moments. (I don't know if these approaches have actually been implemented in any stats package ...) There is also some stuff in McCullagh and Nelder (which I need to bite the bullet and buy) on this topic. 3. GLMMs without estimated scale parameters have the problems listed in #2, but compounded by the problem that (depending how they are calculated) the terms in the deviance may not be independent -- we don't know what the 'effective degrees of freedom' are. Someone with sufficiently good math stats chops might be able to work out an extension of the Cordeiro/Satterthwaite/K-R approaches that would apply in the GLMM case ... 4. If we were to use quasi-likelihood approaches we would then be back in a situation where we had both (all three? deviance !~ chi-square, error in the estimated scale parameter, unknown "effective denominator df") sources of error. Historically/culturally, it seems that in the GLM context people mostly ignore/don't worry about the fact that LRT is only asymptotically correct. (Faraway 2006: "For every GLM except the Gaussian, an approximate null distribution must be used whose accuracy may be in doubt particularly for smaller samples.") This is probably what leads to Z-test statistics being quoted in the GLMM case. Perhaps they shouldn't be.
The Z-statistics are of course not likelihood ratio statistics. They are commonly known as Wald statistics. They can suffer from severe problems that arise when the SEs (based on single term Taylor series approximations) change dramatically in the neighbourhood of the relevant point on the likelihood surface. This leads for binomial data to the Hauck-Donner effect (Modern Applied Statistics with S, 4th edn, p.197), whereby the Z-statistics reduce as one of the treatments that is compared moves further into the tail of the binomial (proportions closer to 0 or 1). There is a corresponding issue for Poisson data, as the estimate of one of the means that is compared moves close to zero; see Maindonald & Braun, Data Analysis & Graphics Using R, 2edn p267, 3edn p263.
In linear models, I use mcmcsamp to get CI or p-values, but I understood that this function is not yet implemented for glmer models. I was wondering then what is the best option to get either CI or p-values that would be "valid" for glmer models? I have seen an old post (2008) from Ben Bolker where he mentioned a "glmersim.R" function he had written (and a following post from Douglas Bates where he mentioned that this function could be included in lme4). Would "glmersim.R" allow me to get something similar to mcmcsamp? Are there still plans to include the possibility to allow glmer to run in mcmcsamp?
I don't know about plans for mcmcsamp with glmer. I am working to incorporate glmersim into lme4 in the reasonably near future (subject to review/approval by the other lme4 authors ...). This will give a reasonably simple (albeit computationally slow) way to get p-values for terms by doing 'parametric bootstrap' for specific model comparisons (see <http://glmm.wikidot.com> for examples: I would also hope to include some simple examples in the documentation). In principle I think it could also be used to get confidence intervals but would need to think about it some more.
It would be good to be able to experiment with alternative priors, as a check on the robustness of results.
Thanks for your help, Sandra
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models