Skip to content

lmer and p-values

18 messages · Iker Vaquero Alba, Ben Bolker, John Maindonald +3 more

#
Iker Vaquero Alba <karraspito at ...> writes:
When you do anova() in this context you are doing a likelihood ratio
test, which is equivalent to doing an F test with 1 numerator df and
a very large (infinite) denominator df.  
  As Pinheiro and Bates 2000 point out, this is dangerous/anticonservative
if your data set is small, for some value of "small".
   Guessing an appropriate denominator df, or using mcmcsamp(), or parametric
bootstrapping, or something, will be necessary if you want a more
reliable p-value.
#
On 03/28/2011 01:04 PM, Iker Vaquero Alba wrote:
Why are you simplifying the model in the first place?  (That is a real
question, with only a tinge of prescriptiveness.) Among the active
contributors to this list and other R lists, I would say that the most
widespread philosophy is that one should *not* do backwards elimination
of (apparently) superfluous/non-significant terms in the model.  (See
myriad posts by Frank Harrell and others.)

  If you do insist on eliminating terms, then the LRT (anova()) p-values
are no more or less reliable for the purposes of elimination than they
are for the purposes of hypothesis testing.
#
A slightly more accommodating position is that some selection 
may be acceptable if it makes little difference to the magnitudes of
parameter estimates and to the interpretations that can be placed
upon them.  [Since writing this, I notice that Ben has now posted a
message that makes broadly similar follow-up points.]

The usual interpretations of p-values assume, among other things, 
a known model.  This assumption is invalidated if there has been
some element of backward elimination or other element of variable
selection.  Following variable selection, the p-value is no longer, 
strictly, a valid p-value.

Elimination of a term with a p-value greater than say 0.15 or 0.2 is
however likely to make little differences to estimates of other terms
in the model.  Thus, it may be a reasonable way to proceed.  For
this purpose, an anti-conservative (smaller than it should be)  
p-value will usually serve the purpose.

Nowadays it is of course relatively easy to do a simulation that will 
check the effect of a particular variable elimination/selection strategy.  
If there is some use of variable elimination/selection, and anything of 
consequence hangs on the results, this should surely be standard 
practice. 

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 29/03/2011, at 8:18 AM, Ben Bolker wrote:

            
#
On 03/28/2011 06:15 PM, John Maindonald wrote:

            
Note that naive likelihood ratio tests of random effects are likely to
be conservative (in the simplest case, true p-values are twice the
nominal value) because of boundary issues and those of fixed effects are
probably anticonservative because of finite-size effects (see PB 2000
for examples of both cases.)
Ben
#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 11-03-29 07:35 AM, Manuel Sp?nola wrote:
Hmm.  What's the motivation for your question?

  The p-value gives you the probability of the observed pattern, or a
more extreme one, having occurred if the null hypothesis were true.
  The effect size (defined in various ways) tells you something about
the strength of the observed pattern.
   Statistical and subject-area (in your case, biological) significance
are complementary. A highly statistically significant but biologically
trivial effect is a curiosity; a biologically important but
statistically insignificant effect means you need more/better data.

  I don't know if that answers your question.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2RzTIACgkQc5UpGjwzenMT3wCfa9orCpx295kTrVJKScLLKnGb
HSkAn3Rp5TvrdiUJZjTphkW7biIaqkip
=cACS
-----END PGP SIGNATURE-----
#
On Tue, Mar 29, 2011 at 8:45 AM, Manuel Sp?nola <mspinola10 at gmail.com> wrote:
This topic (and this web page) has been discussed at length on
this list recently. Check out the archives.

I like to think of p-values and hypothesis testing as a more scientific
variant of trial by jury, where the theory to be proved ("as charged")
is found guilty by establishing that inconsistent theories (null hypotheses)
are unlikely to be true given the observed data. If the null hypothesis
is true ("beyond a reasonable doubt"), then the theory to be tested "could
not have been at the scene of the crime." Note that just as in a jury
trial, this does not prove that the theory in question is true with
absolute certainty.

In practice one usually entertains several possible models or theories
and selects the one that seems to explain the data best by eliminating
most of the variance in the observations. More precisely, a good model
is one where the residual is negligible and looks like "noise."

Dominick
#
On 03/29/2011 04:44 PM, Manuel Sp?nola wrote:
A couple of points:

  * p-values certainly have their problems, but despite their problems
they answer a need.  Fisher/Neymann/Pearson were pretty smart guys, and
the question that p-values answer ("how likely is it that I would see a
pattern this strong, or stronger, if there were really nothing
happening?") is one that we often want to ask.  It's also nice to have a
concise, general statement of the strength of an effect, even if it has
flaws (arguably we could all be quoting log-likelihood differences, or
standardized regression coefficients, instead).
  * Notice how often the quotes that you posted below say "overuse", or
"undue", or "too much emphasis" (rather than "never" or "forbidden").
Yes, if I had to choose between a p-value and a confidence interval I
would take the confidence interval every time -- but then I have to
decide what kind of confidence interval I want, and if I decide to use
frequentist confidence intervals I am back in the soup again, both with
interpretation and with the difficulties (in the mixed model context) of
computing them appropriately.
  * I wouldn't object if everyone decided to go Bayesian, but that does
have its own cans of worms (deciding on priors, computing [deciding
about convergence if using MCMC], etc.).  Again, if I had to choose
between frequentist *only* or Bayesian *only* I would probably choose
Bayesian. The hybrid-Bayesian approaches (e.g. mcmcsamp, post-estimation
MCMC in AD Model Builder) choose flat priors on the (perhaps arbitrarily
chosen) current scale of the parameters, glossing over details that are
sometimes important.  (The same goes for the pseudo-Bayesian
interpretation of AIC.)

  I agree that the relations among scientific theory and statistical
practices are tough. From Crome 1997:

18.  Use statistical procedures from a range of schools and strictly
adhere to their respective methods and interpretation. For example, do a
Fisherian significance test properly and interpret it properly. Then set
up a formal Neymann-Pearson test and interpret it formally (this means
setting up both Type I and II error rates beforehand, among other
things). Then do an estimation procedure. Then switch hats and do a
Bayesian analysis. Take the results of all four, noting their different
behavior, and come to your conclusion. Good analysis and interpretation
are as important as the fieldwork, so allot adequate time and resources
to both.  ....

Crome, Francis H. J. 1997. Researching tropical forest fragmentation:
Shall we keep on doing what we?re doing? In Tropical forest remnants:
ecology, management, and conservation of fragmented communities, ed. W.
F Laurance and R. O Bierregard, 485-501. Chicago, IL: University of
Chicago Press.

  (There is more here that's worth reading.)
#
From: John Maindonald
I'm afraid that all too often the reason models are chosen on 
"statistical ground" is the lack of "scientific ground".  Sort of 
a catch 22, I guess...  Even when "scientific ground" exists, 
what exactly constitute one, and how do we know it's not 
another rabbit (or ozone) hole?

Andy
Notice:  This e-mail message, together with any attachme...{{dropped:11}}
#
On Tue, Mar 29, 2011 at 7:44 PM, Liaw, Andy <andy_liaw at merck.com> wrote:
Yes, this is particularly so when studying social systems
or any rapidly evolving system (like the financial markets).
In this situation the statistical picture is often just a
snapshot that should probably be labeled (conditioned)
by the time of observation and the context.

In view of this complexity I'm tempted to view p-values
and hypothesis testing (when used in this context) as
a communication protocol that helps statisticians to
reach a consensus, and not as a tool that reveals
timeless truths.

Dominick