gam (mgcv), multiple imputation, f-stats/p-values, and summary(gam)
?summary.gam ## The Help page Since the Help page is presumably not enough, why don't you look at the code?? R is open source. summary.gam ## at the prompt -- Bert
On Wed, May 8, 2013 at 7:12 AM, Andrew Crane-Droesch <andrewcd at gmail.com> wrote:
Dear All,
I'm using gam for a project that involves multiple imputation, and it has
led me to a question about how f-statistics/p-values work in gam.
Specifically, how do the values in summary(gam) get generated? As is made
clear by the dumb example below, I'm manipul;ating gam objects to reflect
the MI procedures that I'm using. I don't trust the f-statistics/p-values
that I'm getting, but I'd like to know how to further manipulate these
objects to get trustworthy values. Part of that invovles knowing how the
output in summary(gam) gets generated, and from what elements of a gam
object.
Here is the example:
library(mgcv)
par(mfrow=c(2,2))
impute <- function (a, a.impute){
ifelse (is.na(a), a.impute, a)
}
#make some dumb fake data
x1.knots = c(-1,-.5,0,.5,1)
x2.knots = c(-1,-.5,0,.5,1)
K=5
x1 = rnorm(100)
x2 = rnorm(100)
y = .05*exp(x1)-.5*x1 + .1*x2 + x1*x2 + rnorm(100)
#some of it is missing
x1[81:100] = NA
x2[71:90] = NA
#do a few dumb imputations, and fit models to the partially-imputed data
x1.imp = impute(x1, rnorm(100))
x2.imp = impute(x2, rnorm(100))
m1 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, x2.imp
= x2.knots))
plot(m1,plot.type="contour",scheme=2,main="Imp 1")
x1.imp = impute(x1, rnorm(100))
x2.imp = impute(x2, rnorm(100))
m2 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, x2.imp
= x2.knots))
plot(m2,plot.type="contour",scheme=2,main="Imp 2")
x1.imp = impute(x1, rnorm(100))
x2.imp = impute(x2, rnorm(100))
m3 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, x2.imp
= x2.knots))
plot(m3,plot.type="contour",scheme=2,main="Imp 3")
results = list(m1,m2,m3)
#Combine the results according to rubin's rules
reps=3
bhat=results[[1]]$coeff
for (i in 2:reps){
bhat=bhat+results[[i]]$coeff
}
bhat = bhat/reps
W=results[[1]]$Vp
for (i in 2:reps){
W = W+results[[i]]$Vp
}
W = W/reps
B= (results[[1]]$coeff-bhat) %*% t(results[[1]]$coeff-bhat)
for (i in 2:reps){
B = B+(results[[i]]$coeff-bhat) %*% t(results[[i]]$coeff-bhat)
}
B=B/(reps-1)
Vb = W+(1+1/reps)*B
#Put the results into a convenient gam object
MI = results[[1]]
MI$coefficients=bhat
MI$Vp = Vb
#and summarize
summary(MI)
plot(MI,plot.type="contour",scheme=2,main="MI")
When I do something similar with non-fake data I get wacky f-statistics and
p-values. For example, F could be >100 and p could equal 1. This probably
has something to do with degrees of freedom.
My easy question: what goes on under the hood with gam to generate these
values? What parts of a gam object are called up?
My harder question: how might one construct principled analogs of these
statistics in an MI context, when degrees of freedom will vary across
models, depending on the imputed data? Has anybody thought about this, or
do I have have some serious pencil-and-paper ahead of me?
Thanks,
Andrew
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm