interpreting significance from lmer results for dummies (like me)
On Behrens-Fisher; see the Wikepedia article "Unsolved problems in statistics", and references that are given there. Linnik, Jurii (1968). Statistical Problems with Nuisance Parameters. American Mathematical Society. ISBN 0821815709. The link given to the My reading of the literature is that this is not so much an unsolved problem as one that has, within the p-value paradigm, no satisfactory theoretical solution. I take this to be a problem with the p-value paradigm. For Bayesians there is no problem; all inference is conditional on one or other choice of prior. The various frequentist "solutions" to the Behrens-Fisher problem involve one or other more limited form of conditioning. Others may be able to pronounce more authoritatively. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200.
On 27 Apr 2008, at 2:06 PM, Andrew Robinson wrote:
Hi John, On Sun, Apr 27, 2008 at 01:47:40PM +1000, John Maindonald wrote:
HI Andrew - Maybe you'd like to send this to the list. I'd meant to send my reply to the list.
Ok, cc-ing the list.
My understanding is that, in the usual definition, designs are hierarchical when they have a hierarchical error structure, irrespective of whether the fixed effects are hierarchical. From the Wikipedia article on hierachical linear models: "Multilevel analysis allows variance in outcome variables to be analysed at multiple hierarchical levels, whereas in simple linear and multiple linear regression all effects are modeled to occur at a single level. Thus, HLM is appropriate for use with nested data." This is different from the database notion of a hierarchical data structure.
Sure, I agree. But I don't think that this definition excludes designs with crossed random effects from being hierarchical.
The Behrens-Fisher type issue (really only an issue for one small relevant df, when the bounds for the p-value can be very wide) makes p-values, in general, somewhat fraught in the mult-level modeling context. To get unique p-values, one has to make some assumption about bounds on the relevant relative variances. In part for this sort of reason, I do not really care whether what comes from mcmcsamp() closely resembles one or other choice of p-value.
Can you point me to a citation for that issue? Cheers Andrew
Cheers - John. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 27 Apr 2008, at 12:54 PM, Andrew Robinson wrote:
Thanks John - and I have interpolated replies. On Sun, Apr 27, 2008 at 12:04:52PM +1000, John Maindonald wrote:
I have interpolated a few comments. On 26 Apr 2008, at 5:59 PM, Andrew Robinson wrote:
Hi Mark, On Fri, Apr 25, 2008 at 11:53:24PM -0400, Mark Kimpel wrote:
I am a bioinformatistician, with my strongest background in molecular biology. I have been trying to learn about mixed-effects to improve the analysis of my experiments, which certainly contain random effects. I will admit to being totally lost in the discussions regarding lack of p- value reporting in the current versions of lmer. Furthermore, I suspect those that need to publish to non-statistical journals will face reviewers who are equally in the dark. Where can I find a biologist-level explanation of the current controversy,
I'll take a stab. 1) the traditional, Fisher-style test of a null hypothesis is based on computing the probability of observing a test statistic as extreme or more extreme than the one actually observed, assuming that the null hypothesis is true. This probability is called the p-value. If the p-value is less than some cut-off, e.g. 0.01, then the null hypothesis is rejected. 2) in order to compute that p-value, we need to know the cumulative distribution function of the test statistic when the null hypothesis is true. In simple cases this is easy: for example, we use the t-distribution for the comparison of two normal means (with assumed equal variances etc). 3) in (many) hierarchical models the cumulative distribution function of the test statistic when the null hypothesis is true is simply not known. So, we can't compute the p-value.
It can though be simulated. If, though, the assumptions are seriously wrong, all bets are off. In particular normality of effects seems to me a much more often serious issue than in most models with a simple independent and identically distributed error structure, just because there are commonly many fewer independent values at the level that matters for the intended generalization(s).
Agreed. In fact, in a follow-up to the original questioner I also mentioned the parametric bootstrap. That doesn't help with the assumptions, however.
Moreover one can get plausible information about the posterior distribution of the parameters from mcmcsamp(), even a Bayesian equivalent of a p-value.
I'm not sure about the relaibility of those p-values yet.
3a) in a limited range of hierarchical models that have historically dominated analysis of variance, e.g. split-plot designs, the reference distribution is known (it's F).
Or known more or less well.
Yes.
3b) Numerous experts have (quite reasonably) built up a bulwark of intuitive knowledge about the analysis of such designs. 3c) the intuition does not necessarily pertain to the analysis of any arbitrary hierarchical design, which might be unbalanced, and have crossed random effects. That is, the intuition might be applied, but inappropriately.
If it has crossed random effects, it is not hierarchical. This is where the approximations are most likely to break down.
I think that it's still hierarchical with cross effects. We still have observations nested within levels of random effects.
4) in any case, the distribution that is intuitively or otherwise assumed is the F, because it works in the cases mentioned in 3a. All that remains is to define the degrees of freedom. The numerator degrees of freedom are obvious, but the denominator degrees of freedom are not known. 4a) numerous other packages supply approximations to the denominator degrees of freedom, eg Satterthwaite, and KR (which is related). They have been subjected to a modest degree of scrutiny by simulation. 5) however, it is not clear that the reference distribution is really F at all, and therefore it is not clear that correcting the denominator degrees of freedom is what is needed. Confusion reigns on how the p-values should be computed. And because of this confusion, Doug Bates declines to provide p-values.
Simulation can provide a definitive answer, if one believes the model. I do not think that Doug has any fundamental objection to giving the KR approximation. He has, reasonably, had other priorities.
Yes, these things are true.
Nor has anyone, so far, offered an algorithm that uses the lmer output structures to implement the KR approximation.
Doug and I are looking at it, but it's slow going because he's busy and I'm - well - slow.
how can I learn how to properly judge significance from my lmer results,
There are numerous approximations, but no way to properly judge significance as far as I am aware. Try the R-wiki for algorithms, and be conservative.
As indicated above, there are other mechanisms, safer in general than using an approximation. Users have to work at them though.
I think that those are still approximations, but of a different order.
Note also that, when tests have a numerator that is a linear combination of variances, there can be issues of a Behrens-Fisher type (see, e.g. the Wikepedia Behrens-Fisher article), where it is impossible to derive a p-value that is valid independent of the relative magnitudes of the unknown variances. The Behrens-Fisher approximation is though, usually, a reasonable compromise.
I didn't know about that; thanks.
http://wiki.r-project.org/rwiki/doku.php Or, use lme, report the p-values computed therein, and be aware that they are not necessarily telling you exactly what you want to know.
and what peer-reviewed references can I steer reviewers towards?
Not sure about that one. I'm working on some simulations with Doug but it's slow going, mainly because I'm chronically disorganised.
I understand, from other threads, that some believe a paradigm shift away from p-values may be necessary, but I it is not clear to me what paradigm will replace this entrenced view. I can appreciate the fact that there may be conflicting opinions about the best equations/algorithms for determining significance, but is there any agreement on the goal we are heading towards?
The conflict is not about p-values per se, but about the way that they are calculated. I would bet that the joint goal is to find an algorithm that provides robust, reasonable inference in a sufficiently wide variety of cases that its implementation proves to be worthwhile.
There is an argument about p-values per se, One should, arguably, be examining the profile likelihood, or the whole of the posterior distribution.
"arguably" ... that is an argument! :) Andrew -- Andrew Robinson Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/
-- Andrew Robinson Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models