Skip to content
Prev 5185 / 20628 Next

glmmPQL simplification

On 11-01-23 02:12 PM, Iker Vaquero Alba wrote:
? ?  Are
There are a few distinctions that you might be confusing here:

  The main distinction is between marginal tests of parameters
(typically Wald tests), as they show up in summary() vs. nested model
comparisons (either F or likelihood ratio ('test="Chisq"') as done by
anova() or drop1()

    For linear models of continuous covariates, or for factors with only
two levels, in linear models (i.e. from lm()), at the top level of
marginality (i.e. without higher-level interactions present), these are
identical.  In most other cases (glm(), lmer(), they are not): when you
have a choice, the model comparison approach is usually better (although
less convenient), for two reasons: (a) the approximations involved in
the test may be more accurate (Wald vs LRT) and (b) it is easier to be
precise about the actual hypothesis you're testing (continuing confusion
about marginality, "type III SS", etc.). The main advantage of the
marginal tests is convenience, and applicability in some cases where the
other (generally preferred) statistical tests such as LRT cannot be
applied).

  For factors (categorical predictors) with more than two levels, some
sort of model comparison approach is generally necessary, because the
hypothesis you want to test (stated in terms of parameter values) is
that several parameters are jointly equal to zero.  anova() and drop1()
can do this, although the wald.test() function in the aod package can do
a multi-variable equivalent of the Wald tests typically encoded in
summary().

  The other issue is the dreaded degrees-of-freedom problem. The general
question is whether one can do a finite-sample correction of the test
statistic, and if so whether there exists a sensible framework and a
sensible procedure for estimating the effective amount of information in
the data set (and in some cases also the effective difference in amount
of information used by two different models).

  For glmmPQL, summary() shows Wald t-tests based on the degrees of
freedom estimated in lme() via a "between-within" method. It's not clear
how coherent this is (John Maindonald or Brian Ripley might comment),
although it should be a *reasonable* order-of-magnitude estimate of the
effects of different parameters.
   Likelihood ratio tests are not possible because there is no
likelihood defined for a model fitted via quasi-likeihood, although *in
principle* one might be able to come up with a "quasi-" test analogous
to that used by anova() for models fitted with quasi-likelihood in glm().
   Because it depends on the likelihood, AIC in the strict sense is also
not available for glmmPQL. For the simpler case of GLMs fitted with
quasi-likelihood, researchers have proposed a quasi-AIC (QAIC) that may
(?) have similar properties, but it hasn't been implemented for glmmPQL
(and would take some careful thinking to see how it extended to this
slightly different case).

  I personally think it would be a reasonable philosophical stance to
say of GLMM fits via PQL, "this is an approximation to a well-defined
quantity (likelihood), here is my estimate in this case", but that's not
what has been done.

  Bottom line: I think you are stuck with marginal tests in this case.
Please don't disregard all the very important warnings given by others
about whether stepwise regression is appropriate, and whether you can
reasonably expect to do the analysis you are attempting (estimate 15
parameters from a data set that has a total of 48 observations, and
based on the reported df from your fit, may have far fewer 'effective
degrees of freedom').

  good luck,
    Ben Bolker
(iii) Lastly, if you're trying to do model simplification, the p-values
are already there in the summary table you posted. Why not remove the
least significant term (starting with sexM:weatherpc1 not
weatherpc1:tlength) and look at the new summary table rather than that
anova() test?
iteration 1
iteration 1