mgcv: relative risk from GAM with distributed lag

Hi Simon,

Thanks for that example. It's been very useful!  It turned out that I
created the prediction data for lag incorrectly. I've now managed to
get perspective plots on the relative risk scale.

I've now moved on to trying to get a plot of overall RR (y-axis) for a
range of temperature values (x-axis), where RR for each temperature
value is summed over lag periods 0-6 (see bottom plot page 2230 in
Gasparrini's paper: https://doi.org/10.1002/sim.3940). This now seems
straighforward with predict.gam() and type = "terms" in the way you
suggested, but as acknowledged, that doesn't give me credible
intervals. I've just been told that I MUST present results with
confidence/ credible intervals (which is fair enough).

For the last two days I've tried making the plot of overall RR with
CIs, using the lpmatrix, as you suggested (looked it up in your book,
and also an earlier post of yours with an example:
https://stat.ethz.ch/pipermail/r-help/2012-May/314387.html), but I
don't even understand how the book example can be applied to my
problem, and I can't get the example in the online post to work. I
don't really understand what an lpmatrix is, (other than that it seems
to give me a value for each smooth term for each basis function?) or
what the code in the book example does, so I have no idea what I am
doing wroing, or what good results are supposed to look like. (I'm
sorry, my background is in public health rather than statistics).

I am hoping that I can ask for your (or anyone's)  help one final
time. (Once I have the CI's, my modelling will be finished, and I will
be out of everyone's hair with questions about GAMs... at least for a
while!).

Here's my code with reproducible data to get a plot of overall RR
(y-axis) vs temperature (x-axis) where overall RR for each temperature
value is obtained by summing RR over lag periods 0-6. But I have
absolutely no idea how to use predict.gam with type = lpmatrix to get
from this curve to one with CIs.

library(mgcv)

# simulate data
set.seed(3) # make reproducible example
simdat <- gamSim(1,400)
g <- exp(simdat$f/5)
simdat$y <- rnbinom(g,size=3,mu=g)  # negative binomial response var
simdat$time <- 1:400  # create time series
names(simdat) <- c("deaths", "temp", "humidity", "rain", "x3", "f",
"f0", "f1", "f2", "f3", "time")

# lag function based on  Simon Wood's book (2017, p.349)
lagard <- function(x,n.lag=7) {
n <- length(x); X <- matrix(NA,n,n.lag)
for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1]
X
}

# create lagged variables as 7-column matrices
dat <- list(lag=matrix(0:6,nrow(simdat),7,byrow=TRUE),
deaths=simdat$deaths, time = simdat$time)
dat$temp <- lagard(simdat$temp)
dat$rain <- lagard(simdat$rain)
dat$humidity <- lagard(simdat$humidity)

mod <- gam(deaths~s(time, k=70) + te(temp, lag, k=c(12, 4)) +
te(humidity, lag, k=c(12, 4)) + te(rain, lag, k=c(12, 4)), data = dat,
family = nb, method = 'REML', select = TRUE)

# create prediction data
m1 <- 40 # number of points to predict across the range of each weather variable
m2 <- 7 # number of lag periods
n <- m1*m2 # total number of prediction points

pred_temp <- seq(min(dat$temp, na.rm = T), max(dat$temp, na.rm = T),
length = m1)
pred_lag <- 0:6
pred_rain <- seq(min(dat$rain, na.rm = T), max(dat$rain, na.rm = T),
length = m1)
pred_humidity <- seq(min(dat$humidity, na.rm = T), max(dat$humidity,
na.rm = T), length = m1)
pred_time <- seq(min(dat$time, na.rm = T), max(dat$time, na.rm = T),
length = m1)

pd <- data.frame(temp=rep(pred_temp,m2),lag=rep(pred_lag,each=m1),rain=rep(pred_rain,m2),humidity=rep(pred_humidity,m2),time=rep(pred_time,m2))

#  prediction data with reference temperature set to median
temperature, all else equal to pd
pred_temp0 <- median(dat$temp, na.rm=T)
pd0 <- data.frame(temp=rep(pred_temp0,n),lag=rep(pred_lag,each=m1),rain=rep(pred_rain,m2),humidity=rep(pred_humidity,m2),time=rep(pred_time,m2))

# make predictions
predictions <- predict.gam(mod, newdata=pd, type = "terms")
predictions0 <- predict.gam(mod, newdata=pd0, type = "terms")

# calculate RR (relative to reference temperature set above)
diff <- predictions[,2] - predictions0[,2]
rr <- as.vector(exp(diff))

# convert rr to matrix required for z argument for persp() plot
rr_mat <- matrix(rr, m1, m2)
persp(x=pred_temp,y=pred_lag,z=rr_mat,theta=30,phi=30,ticktype =
"detailed", col="blue",zlab="RR")

# create plots of overall RR (y-axis, summed over lags 0-6) vs
temperature (x-axis)
# convert predictions to matrix to allow summing over lags
pmat <- matrix(predictions[,2],m1,m2)
pmat0 <- matrix(predictions0[,2],m1,m2)

# sum predictions over lags 0-6 for each temperature
psummed <- rowSums(pmat)
psummed0 <-rowSums(pmat0)

# subtract the predictions for the reference temperature from the
predictions for each temperature
pref <- psummed-psummed0

# exponentiating a difference on log-scale gives RR
rr_overall <- exp(pref)
plot(pred_temp, rr_overall, type = "l")

# now create CIs for overall RR plot
Xp <- predict(a1a_high_heat,newdata=pd,type="lpmatrix")

# and how to procede now?????? So close to having results, and yet....

mgcv: relative risk from GAM with distributed lag

Thread (9 messages)