same old question - lme4 and p-values
On Sat, Apr 5, 2008 at 5:10 AM, Reinhold Kliegl
<reinhold.kliegl at gmail.com> wrote:
Here is a section that worked in Kliegl, Risse, & Laubrock (2007, J
Exp Psychol:Human Perception and Performance, 33, 1250-1251).
"Analysis
Inferential statistics are based on a linear mixed-effects model
(lme) specifying participants and items as crossed random effects.
This analysis takes into account differences between participants and
differences between items in a single sweep and has been shown to
suffer substantially less loss of statistical power in unbalanced
designs than traditional ANOVAs over participants (F1) and items (F2;
see Baayen, in press, Pinheiro & Bates, 2000; Quen? & van den Bergh,
2004, for simulations).
We used the lmer program (lme4 package; Bates & Sarkar, 2006) in
the R system for statistical computing (R Development Core Team, 2006)
and report regression coefficients (b; absolute effect size in ms),
standard errors (SE), and p-values for an upper-bound n of denominator
degrees of freedom computed as n of observations minus n of fixed
effects. As these p-values are potentially anti-conservative, we
generated confidence intervals from the posterior distribution of
parameter estimates with Markov Chain Monte Carlo methods, using the
mcmcsamp program in the lme4 package with default specifications
(e.g., n=1000 samples; locally uniform priors for fixed effects;
locally non-informative priors for random effects). Both procedures
yielded the same results.
Finally, we also computed post-hoc power statistics for the
preview and lexical status main effects and for the interaction effect
on first fixation durations (with effect sizes similar to those
reported earlier, e.g., Kliegl, 2007), and using lme estimates of
between-participant, between-item, and residual variances (Gelman &
Hill, in press). For the observed proportion of random loss of items,
power estimates based on 1000 simulations each were around .85 for
word n and n+2 and .59 for word n+1 (due to the higher skipping
rate)." (page 1251)
Power statistics were included in response to a reviewer request. I am
not much in favor of post-hoc power statistics; but note that here
they are restricted to the use of estimates of random effects. For
reviewers, we also included traditional F1- and F2-ANOVA tables; they
are not part of the article. In other articles, it has also been
acceptable to report coefficients, their standard errors, and their
ratio, and to say that coefficients larger than 2 SE are interpreted
as significant (e.g., Kliegl, 2007, J Exp Psychol: General, 136,
530-537), that is, it is possible to leave out p-values completely.
Corrections and improvements of the above sentences are highly welcome
for future articles. In perspective, I think the p-value problem will
simply go away.
Best
Reinhold
PS: Would it be useful to have a site where peer-reviewed articles
using lme4 for statistical inference are listed and, possibly,
retrievable versions are provided?
Thanks for the suggestion, Reinhold. I would be delighted to provide a page on http://lme4.r-forge.r-project.org/ to list such references. May I ask for a volunteer to maintain such a listing? I am rather overextended at present trying to get lme4_1.0-0 out and writing a book about what it does. All that is required is to obtain a R-forge login, decide how to organize the pages and then update the pages as new references are submitted.