same old question - lme4 and p-values
I agree with Lorenz Gygax. I'll come back to p-values below. Confidence intervals (CIs) make, for me, a lot more sense than p-values. The reality, though, is that users will interpret CIs as some kind of probability statement. For all practical purposes, a CI is just the Bayesian credible interval that one gets with some suitable "non-informative prior". Why not then be specific about the prior, and go with the Bayesian credible interval? (There is an issue whether such a prior can always be found. Am right in judging this no practical consequence?) There are cases where the prior is informative in a sense that breaks the nexus between the CI and a realistic Bayesian credible interval. A similar issue arises for a p-value; the probability of the evidence given innocence (or freedom from some rare disease) (this is H0) is dramatically different from the probability of innocence given the evidence, and it may be a difference between 1/100000 and 1/2. Where the Bayesian credible interval and the CI are dramatically different, a p-value or CI can only mislead. In the way that p-values are commonly taught, it may take considerable strength of will to avoid confusion between P(A | H0) and (P(H0 | A)! For intervals for variances, the prior can matter a lot, if a smallish number of independent pieces of information is used to estimate the variance and/or those pieces of information have widely varying weights. I guess that emphasizes how insecure inference about variances can be. It is much worse than the common forms of CI indicate. If one is to take abs(t) > 2 as indicating significance, this is under iid Normal assumptions a p-value of 0.1 for 5df, and 0.18 for 2 degrees of freedom. One has to ask members of the relevant scientific community whether they are comfortable with that, given also that those p-values are likely to be more than otherwise suspect because of the small number of degrees of freedom. Or are we discussing experiments where we always have at least 10 degrees of freedom? If not, and there is an insistence on making claims of "significance", maybe we want abs(t) > 2.5 or abs(t) > 3. I do not see any cogent reason to be concerned that the distribution of the Bayesian p-value may, under H0, be far from uniform on (0,1). This, if it is an issue, is an especial issue for intervals for variances. Why not then, for models fitted using lmer, a Bayesian HPD interval, given that Douglas has made it so easy to calculate these? This seems to me more than otherwise pertinent if the emphasis is on effect size. None of these various measures is more than a very crude summary of what has been achieved. Maybe plots of posterior density estimates might be given for key parameters, ideally with some indication of sensitivity to the prior (this would need more than mcmcsamp()). In any case, publish the data, so that the sceptical reader can make his/her own checks, and/or use it in the design of future experiments, and/or so that it can be used as a teaching resource. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200.
On 6 Apr 2008, at 12:13 AM, Martin Maechler wrote:
"Jon" == Jonathan Baron <baron at psych.upenn.edu> on Sat, 5 Apr 2008 07:21:19 -0400 writes:
Jon> On 04/05/08 12:10, Reinhold Kliegl wrote: [...]
In perspective, I think the p-value problem will simply go away.
Jon> I'm not sure what you mean here. If you mean to
Jon> replace them with confidence intervals, I have no
Jon> problem with that. But, as a journal editor, I am
Jon> afraid that I will continue to insist on some sort of
Jon> evidence that effects are real. This can be done in
Jon> many ways. But too many authors submit articles in
Jon> which the claimed effects can result from random
Jon> variation, either in subjects ("participants*") or
Jon> items, and they don't correctly reject such alternative
Jon> explanations of a difference in means.
Jon> I have noticed a kind of split among those who comment
Jon> on this issue. On the one side are those who are
Jon> familiar with fields such as epidemiology or economics
Jon> (excluding experimental economics), where the claim is
Jon> often made that "the null hypothesis is always false
Jon> anyway, so why bother rejecting it?" These are the
Jon> ones interested in effect sizes, variance accounted
Jon> for, etc. They are correct for this kind of research,
Jon> but there are other kinds of research.
Jon> On the other side, are those from (e.g.) experimental
Jon> psychology, where the name of the game is to design
Jon> experiments that are so well controlled that the null
Jon> hypothesis will be true if the effect of interest is
Jon> absent. As a member of this group, when I read people
Jon> from the first group, I find it very discouraging. It
Jon> is almost as if they are saying that what I work so
Jon> hard to try to do is impossible.
Jon> To get a little specific, although I found Gelman and
Jon> Hill's book very helpful on many points (and it does
Jon> not deny the existence of people like me), it is
Jon> written largely for members of the first group. By
Jon> contrast, Baayen's book is written for people like me,
Jon> as is the Baayen, Davidson, and Bates article, "Mixed
Jon> effects modeling with crossed random effects for
Jon> subjects and items."
Jon> I'm afraid we do need significance tests, or confidence
Jon> intervals, or something.
I agree even though I'm very deeply inside the camp of statisticians
who know that all models are wrong but some are useful, and
hence I do not "believe" any P-values (or exact confidence /
credibility intervals).
For those who need ``something like a P-value'' I've heard
yesterday Lorenz Gygax (also subscriber here) proposing
to report the "credibility of 0", possibly "2-sided", as a
pseudo-P value;, i.e. basically that would be
2 * k/n, for an MCMC sample b_1,b_2, ..., b_n
k := {min k'; b_k' > 0}.
The reasoning would be the following:
Use the 1-to-1 correspondence between confidence intervals and
testing pretending that the credibility intervals are confidence
intervals, and consequently you just need to look at which
confidence level 0 will be at the exact border of the
credibility interval.
Yesterday after the talk, I found that a good idea.
Just now, it seems a bit doubtful, since under the null
hypothesis, I don't think such a pseudo P-value would be uniform
in [0,1].
Martin
Jon> * On "participants" vs. "subjects" see:
Jon> http://www.psychologicalscience.org/observer/getArticle.cfm?id=1549
Jon> -- Jonathan Baron, Professor of Psychology, University
Jon> of Pennsylvania Home page:
Jon> http://www.sas.upenn.edu/~baron Editor: Judgment and
Jon> Decision Making (http://journal.sjdm.org)
Jon> _______________________________________________
Jon> R-sig-mixed-models at r-project.org mailing list
Jon> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models