Skip to content

[Fwd: Re: Wald F tests]

6 messages · Murray Jorgensen, Ben Bolker, Douglas Bates

#
But ... LRTs are non-recommended (anticonservative) for
comparing fixed effects of LMMs hence (presumably) for
GLMMs, unless sample size (# blocks/"residual" total sample
size) is large, no?

I just got through telling readers of
a forthcoming TREE (Trends in Ecology and Evolution) article
that they should use Wald Z, chi^2, t, or F (depending on
whether testing a single or multiple parameters, and whether
there is overdispersion or not), in preference to LRTs,
for testing fixed effects ... ?  Or do you consider LRT
better than Wald in this case (in which case as far as
we know _nothing_ works very well for GLMMs, and I might
just start to cry ...)  Or perhaps I have to get busy
running some simulations ...

  Where would _you_ go to find advice on inference
(as opposed to estimation) on estimated GLMM parameters?

  cheers
   Ben Bolker
Douglas Bates wrote:
5 days later
#
On Tue, Oct 7, 2008 at 4:51 PM, Ben Bolker <bolker at ufl.edu> wrote:

            
My reasoning, based on my experiences with nonlinear regression models
and other nonlinear models, is that a test that involves fitting the
alternative model and the null model then comparing the quality of the
fit will give more realistic results than a test that only involves
fitting the alternative model and using that fit to extrapolate to
what the null model fit should be like.

We will always use approximations in statistics but as we get more
powerful computing facilities some of the approximations that we
needed to use in the past can be avoided.  I view Wald tests as an
approximation to the quantity that we want to use to compare models,
which is some measure of the comparative fit.  The likelihood ratio or
the change in the deviance seems to be a reasonable way of comparing
the fits of two nested models.  There may be problems with calibrating
that quantity (i.e. converting it to a p-value) in which case we may
want to use a bootstrap or some other simulation-based method like
MCMC.  However, I don't think this difficulty would cause me to say
that it is better to use an approximation to the model fit under the
null hypothesis than to go ahead and fit it.
I'm not sure.  As I once said to Martin, my research involves far too
much "re" and far too little "search".  Probably because of laziness I
tend to try to reason things out instead of conducting literature
reviews.
#
Doug's response makes perfect sense to me.

  However, from the on-the-ground, what-do-I-say-about-my-data-now
point of view, it seems that this is really an empirical question.
I would guess (wildly) that both the LRT and the Wald test would
converge asymptotically on the right answer.  **For classical ML
problems**, I have the feeling (unsupported by evidence!) that
LRT converges faster/is less wrong at any given value of N than
Wald tests (which, as you say, represent a second level of
approximation). I have no idea if this is true for GLMMs.
Really the only reason that I spoke against LRTs was that it
is well known (as shown e.g. in PB2000) that they are dicey for
LMMs, while the situation for Wald tests is relatively unknown.
In the absence of data, which is stronger: our prior belief that Wald
tests are bad because they're less reliable than LRT in some other
contexts, or our optimism that Wald tests aren't bad because they
haven't been shown to be so?

  If it really hasn't been done (and while I'm far from omniscient
I did *try* to review the literature on this topic, and have yet
to find an answer, or to have anyone on this list provide
an answer), I guess it's time to crank up
the old simulation engine and have a look ...

  For what it's worth, the results of the (possibly misguided)
inference survey so far are:

don't test hypotheses: 5
LRTs: 5
F/Wald tests: 7
bootstrap: 4
mcmcsamp: 8
randomization of null hyp: 5
AIC: 4
  + 2 write-ins:
  1 for "consilience of approaches"
  1 for BIC

  (out of 26 respondents)

  In hindsight, I would have liked to take mcmcsamp off the table
(or put it in a separate category) since I am really most interested in
finding out/telling researchers what to do NOW.

  cheers
    Ben Bolker
Douglas Bates wrote:
#
Not that Doug needs my support but his support of the likelihood ratio 
as the right thing to be looking at regardless of any calibration 
difficulties strikes a chord with me. There is a famous Tukey quote that 
I can perhaps bend into service here:

?Far better an approximate answer to the right question, than the exact 
answer to the wrong question, which can always be made precise.?

In this context I take the "right question" to be interpretation of the 
likelihood ratio and the "wrong question" to be the local properties of 
the fitted "larger" model.

Murray Jorgensen
Douglas Bates wrote:

  
    
#
Somewhat off-topic, but relevant to the larger question:
is there a good way to hack profile confidence limits for
[g]lmer fits?  (Nothing obvious springs to the eye ...) Has
anyone tried it?

  cheers
    Ben Bolker
Murray Jorgensen wrote:
#
On Mon, Oct 13, 2008 at 3:36 PM, Ben Bolker <bolker at ufl.edu> wrote:
For the fixed effects parameters or for the parameters which I write
as theta and which determine the relative covariance of the random
effects?

In lmer the log-likelihood is optimized as a function of theta only so
you can't profile with respect to the fixed-effects parameters
directly.

You could do it indirectly by changing the offset.  For definiteness,
suppose that you want to profile with respect to the intercept
coefficient then you move the intercept column from the X matrix to
the offset.  Changing the coefficient corresponds to scaling the
intercept after which you reoptimize the model.

That is by no means a complete description of an algorithm but I hope
it gives the flavor of the calculation.