Skip to content

bootstrapping gam models with multiple explanatory terms

2 messages · Basil Iannone, Gavin Simpson

#
Dear Basil,

You are describing a non-parametric bootstrap procedure. There is
nothing wrong with what you are doing, but the approach you are taking
to extract/plot the smooth terms is a little too simplistic and hence
you've hit a brick wall.

To bootstrap multiple terms, I would have my training data in a data
frame, say `train`, and draw a bootstrap sample from that. I would fit
my model in the loop like this

boot.mod <- gam(y ~ s(x, bs = "cr") + s(z, bs = "cr"),
                data = train[k.star, , drop = FALSE])

notice how I am not fiddling with the model representation just the data
object used to fit the model.

Then I would use `predict(...., type = "terms")`, not `type = "link"`
(the default). `type = "terms"` returns a matrix of contributions of
terms in the model. In the above model you'd have a matrix with 2
columns, one for the smooth on `x` and one on `z`. These are the centred
smooth functions. To relate this to the values you were producing with
`predict(....)` in your example note that

predterms <- predict(boot.mod, newdata, type = "terms")
pred <- attr(predterms, "constant") + rowSums(predterms)

gives object `pred` which should be equivalent to `predict(boot.mod)`.
The constant is the model intercept, which is an attribute of the
returned object.

You could then plot each column of `pred` against the relevant column
from `newdata` to give the bootstrap smooth for each term. I would do
that with `lines()` to add them to the plot.

In addition, note that you could sample form the posterior distribution
of the model to generate something similar; the splines are associated
with parameters and the model estimates for these parameters form a
multivariate normal distribution. You can take random draws from the
distribution to examine the variation in the shapes that could be taken
by the fitted splines given the uncertainty in fitting. Simon Wood has
an example of this in his "GAM: an introduction with R book" and I used
this in a blog post recently:

http://wp.me/pZRQ9-2j

Simon's book also has an example of a parametric bootstrap where the
resampling is done from the model residuals to create new data from the
fitted model and then fit a new model to the new data.

HTH

G
On Thu, 2012-05-03 at 20:36 -0500, Basil Iannone wrote: