mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

Having looked at this further, I've made some changes in mgcv_1.7-17 to 
the p-value computations for terms that can be penalized to zero during 
fitting (e.g. s(x,bs="re"), s(x,m=1) etc).

The Wald statistic based p-values from summary.gam and anova.gam (i.e. 
what you get from e.g. anova(a) where a is a fitted gam object) are 
quite well founded for smooth terms that are non-zero under full 
penalization (e.g. a cubic spline is a straight line under full 
penalization). For such smooths, an extension of Nychka's (1988) result 
on CI's for splines gives a well founded distributional result on which 
to base a Wald statistic. However, the Nychka result requires the 
smoothing bias to be substantially less than the smoothing estimator 
variance, and this will often not be the case if smoothing can actually 
penalize a term to zero (to understand why, see argument in appendix of 
Marra & Wood, 2012, Scandinavian Journal of Statistics, 39,53-74).

Simulation testing shows that this theoretical concern has serious 
practical consequences. So for terms that can be penalized to zero, 
alternative approximations have to be used, and these are now 
implemented in mgcv_1.7-17 (see ?summary.gam).

The approximate test performed by anova(a,b) (a and b are fitted "gam" 
objects) is less well founded. It is a reasonable approximation when 
each smooth term in the models could in principle be well approximated 
by an unpenalized term of rank approximately equal to the edf of the 
smooth term, but otherwise the p-values produced are likely to be much 
too small. In particular simulation testing suggests that the test is 
not to be trusted with s(...,bs="re") terms, and can be poor if the 
models being compared involve any terms that can be penalized to zero 
during fitting. (Although the mechanisms are a little different, this is 
similar to the problem we would have if the models were viewed as 
regular mixed models and we tried to use a GLRT to test variance 
components for equality to zero).

These issues are now documented in ?anova.gam and ?summary.gam...

Simon

mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

Thread (5 messages)