significant terms in spline model using GAM
Hi.. I'm using gam() to fit a spline model for a data set that has two predictor variables (say A and B). The results indicate that the higher order interaction terms are significant. The R^2 jumps from .5 to .9 when I change the maximum order for the interaction from 10 to 15 (i.e. (AB)^10 to (AB)^15).
- This is perhaps not the best way of thinking about the interaction terms, there are certainly no terms like (AB)^10 or (AB)^15 in the basis produced by s(A,B,k=10 or 15).
Is there any way of finding out which of the terms in the model are really "significant" so that I could drop some of the terms from the model?
The default model selection used by gam() is GCV, a mean square error criterion, and I'm not sure how useful it is to mix model selection by hypothesis testing with GCV model selection. I think that your results indicate that in GCV terms your original choice of k=10 was too restrictive. If you want to do model selection by hypothesis testing you can - s(A,B,k=10,fx=TRUE) is nested within s(A,B,k=15,fx=TRUE), for example - however the process is not automated - you would have to construct F-ratios (or deviance differences) yourself from the response data and the fitted values. best, Simon _____________________________________________________________________
Simon Wood simon at stats.gla.ac.uk www.stats.gla.ac.uk/~simon/
Department of Statistics, University of Glasgow, Glasgow, G12 8QQ
Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814