Skip to content

mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

5 messages · Martijn Wieling, Simon Wood

#
Dear useRs,

I am using mgcv version 1.7-16. When I create a model with a few
non-linear terms and a random intercept for (in my case) country using
s(Country,bs="re"), the representative line in my model (i.e.
approximate significance of smooth terms) for the random intercept
reads:
                        edf       Ref.df     F          p-value
s(Country)       36.127 58.551   0.644    0.982

Can I interpret this as there being no support for a random intercept
for country? However, when I compare the simpler model to the model
including the random intercept, the latter appears to be a significant
improvement.
Model 1: ....
Model 2: .... + s(BirthNation, bs="re")
  Resid. Df Resid. Dev     Df Deviance      F    Pr(>F)
1    789.44     416.54
2    753.15     373.54 36.292   43.003 2.3891 1.225e-05 ***

I hope somebody could help me in how I should proceed in these
situations. Do I include the random intercept or not?

I also have a related question. When I used to create a mixed-effects
regression model using lmer and included e.g., an interaction in the
fixed-effects structure, I would test if the inclusion of this
interaction was warranted using anova(lmer1,lmer2). It then would show
me that I invested 1 additional df and the resulting (possibly
significant) improvement in fit of my model.

This approach does not seem to work when using gam. In this case an
apparent investment of 1 degree of freedom for the interaction, might
result in an actual decrease of the degrees of freedom invested by the
total model (caused by a decrease of the edf's of splines in the model
with the interaction). In this case, how would I proceed in
determining if the model including the interaction term is better?

With kind regards,
Martijn Wieling

--
*******************************************
Martijn Wieling
http://www.martijnwieling.nl
wieling at gmail.com
+31(0)614108622
*******************************************
University of Groningen
http://www.rug.nl/staff/m.b.wieling
3 days later
#
Dear Martijn,

Thanks for the off line code and data: very helpful.

The answer to this is something of a 'can of worms'. Starting with the 
p-value inconsistency. The problem here really is that neither test is 
well justified in the case of s(...,"re") terms (and not having realised 
the extent of the problem it's not flagged properly).

In the case of the p-value from `summary', the p-value is computed as if 
the random effect were any other smooth. However the theory on which the 
p-values for smooths rests does not hold for "re" terms (basically the 
usual notion of smoothing bias is meaningless in the "re" case, and "re" 
terms can not usually be well approximated by truncated eigen 
approximations). The upshot is that you can get bias toward accepting 
the null. I'll revert to doing something more sensible for "re" terms 
for the next release, but it still won't be great, I guess.

The p-value from the comparison of models via 'anova' is equally suspect 
for "re" terms. Basically, this test is justified as a rough 
approximation in the case of usual smooth models, by the fact that we 
can approximate the model smooths by unpenalized reduced rank eigen 
approximations having degrees of freedom set to the effective degrees of 
freedom of the smooths. Again, however, such reduced rank approximations 
are generally not available for "re" terms, and I don't know if there is 
then a decent justification for the test in this case.

'AIC' might then be seen as the answer for model selection, but Greven 
and Kneib (2010, Biometrika), show that this is biased towards selecting 
the larger random effects model in this case (they provide a correction, 
but I'm not sure how easy it is to apply here).

You are left with a couple of sensible possibilities that are easy to 
use, if it's not clear from the estimates that the term is zero. Both 
involve using gam(...,method="REML") or gam(...,method="ML").

1. use gam.vcomp to get a confidence interval for the "re" variance 
component. If this is bounded well away from zero, then the result is 
clear.

2. Run a glrt test based on twice the difference in ML/REML score 
reported for the 2 models (c.f. chisq on 1 df for your case). This 
suffers from the usual problem of using a glrt test to test a variance 
component for equality to zero. (AIC based on this marginal likelihood 
doesn't fix the problem either --- see Greven and Kneib, again).

The second issue, that adding a fixed effect can reduce the EDF, while 
improving the fit, is less of a problem, I think. If I'm happy to select 
the degree of smoothness of a model by GCV, REML or whatever, then I 
should also be happy to accept that the model with the fewer degrees of 
freedom, but more variables, is better than the one with more degrees of 
freedom and fewer variables. (The converse that I would ever reject the 
better fitting, less complex model is obviously perverse).

You can get similar effects in ordinary linear modelling: adding an 
important predictor gives such an improvement in fit that you can drop 
polynomial dependencies on other predictors, so a model with more 
degrees of freedom but fewer variables does worse than one with fewer 
degrees of freedom and more variables... the issue is just a bit more 
prominent when fitting GAMs because part of model selection is 
integrated with fitting in this case.

best,
Simon
> 08/05/12 15:01, Martijn Wieling wrote:

  
    
#
Dear Simon,

Thanks for your concise reply, this is very helpful.

With respect to my second question, however, I was not entirely clear
- or perhaps I'm misunderstanding your answer. What I meant is:
suppose I have a model with a random effect s(X, bs="re"). Now I want
to test if a certain (fixed-effect) predictor A improves the model.

I therefore compare:
m1 = gam(Y ~ s(X,bs="re"), data=dat)
m2 = gam(Y ~ A + s(X,bs="re"), data=dat)

What I didn't make explicit before is that A in the model summary of
m2 does not reach significance (e.g., p = 0.2). Comparing the models
m1 and m2, shows that m1 is the more complex model (as adding A
decreases the edf's invested in the ranef spline with more than 1),
and m1 is not significantly better than m2. Now my question is, should
I keep m2, even though A is not significant itself? Or should I ignore
the result of anova(m1,m2) anyway, given that this comparison is not
suitable when comparing models including random effects (as you argue
regarding my first question)?

If that is the case and the anova is not usable to compare m1 and m2
due to the random effect parameter, note that the same can occur
without random effects but when a non-linearity is included such as
s(Longitude,Latitude). What then is appropriate: keep m1 (which is
more complex), or use m2 (which has a less complex non-linearity, but
includes an additional non-significant fixed-effect factor).

With kind regards,
Martijn

--
*******************************************
Martijn Wieling
http://www.martijnwieling.nl
wieling at gmail.com
+31(0)614108622
*******************************************
University of Groningen
http://www.rug.nl/staff/m.b.wieling
*******************************************
On Fri, May 11, 2012 at 5:43 PM, Simon Wood <s.wood at bath.ac.uk> wrote:
11 days later
#
Martijn,

I don't think there is one right answer to this. If you look at things 
in the way that one would usually view a smooth model then m2 is both 
simpler (lower EDF) and fits better, so is simply a better model (if the 
simpler model fits better then why would you not use it?).

But of course `simpler' depends on whether you view the random effect as 
counting for one parameter, or for it's `effective degrees of freedom'. 
  If it's the former then you should probably fit models using 
method="ML" and compare via a GLRT test using the ML score, or simply 
drop the fixed effect if its p-value according to anova(m2) is too high.

I would not use anova(m1,m2) in this case, because of the difficulty in 
interpreting the random effects as being equivalent to un-penalized 
effects with rank equal to the random effect edfs.

best,
Simon
On 11/05/12 17:50, Martijn Wieling wrote:

  
    
#
Having looked at this further, I've made some changes in mgcv_1.7-17 to 
the p-value computations for terms that can be penalized to zero during 
fitting (e.g. s(x,bs="re"), s(x,m=1) etc).

The Wald statistic based p-values from summary.gam and anova.gam (i.e. 
what you get from e.g. anova(a) where a is a fitted gam object) are 
quite well founded for smooth terms that are non-zero under full 
penalization (e.g. a cubic spline is a straight line under full 
penalization). For such smooths, an extension of Nychka's (1988) result 
on CI's for splines gives a well founded distributional result on which 
to base a Wald statistic. However, the Nychka result requires the 
smoothing bias to be substantially less than the smoothing estimator 
variance, and this will often not be the case if smoothing can actually 
penalize a term to zero (to understand why, see argument in appendix of 
Marra & Wood, 2012, Scandinavian Journal of Statistics, 39,53-74).

Simulation testing shows that this theoretical concern has serious 
practical consequences. So for terms that can be penalized to zero, 
alternative approximations have to be used, and these are now 
implemented in mgcv_1.7-17 (see ?summary.gam).

The approximate test performed by anova(a,b) (a and b are fitted "gam" 
objects) is less well founded. It is a reasonable approximation when 
each smooth term in the models could in principle be well approximated 
by an unpenalized term of rank approximately equal to the edf of the 
smooth term, but otherwise the p-values produced are likely to be much 
too small. In particular simulation testing suggests that the test is 
not to be trusted with s(...,bs="re") terms, and can be poor if the 
models being compared involve any terms that can be penalized to zero 
during fitting. (Although the mechanisms are a little different, this is 
similar to the problem we would have if the models were viewed as 
regular mixed models and we tried to use a GLRT to test variance 
components for equality to zero).

These issues are now documented in ?anova.gam and ?summary.gam...

Simon
On 08/05/12 15:01, Martijn Wieling wrote: