MICE for Cox model
I encountered this problem about 18 months ago. I contacted Prof. Fox
and Dr. Malewski (the R package maintainers for mice) but they referred
me to Prof. van Buuren. I wrote to Prof. van Buuren but am unable to
find his reply (if he did reply).
Here are the functions I used at that time, if you want to take it with
lots of salt. Let me know if you find anything fishy with it.
coxph.mids <- function (formula, data, ...) {
call <- match.call()
if (!is.mids(data)) stop("The data must have class mids")
analyses <- as.list(1:data$m)
for (i in 1:data$m) {
data.i <- complete(data, i)
analyses[[i]] <- coxph(formula, data = data.i, ...)
}
object <- list(call = call, call1 = data$call,
nmis = data$nmis, analyses = analyses)
oldClass(object) <- if (.SV4.) "mira" else c("mira", "coxph")
return(object)
}
And in the function 'pool', the small sample adjustment requires
residual degrees of freedom (i.e. dfc). For a cox model, I believe that
this is simply the number of events minus the regression coefficients.
There is support for this from middle of page 149 of the book by Parmer
& Machin (ISBN 0471936405). Please correct me if I am wrong.
Here is the slightly modified version of pool :
pool <- function (object, method = "smallsample") {
call <- match.call()
if (!is.mira(object)) stop("The object must have class 'mira'")
if ((m <- length(object$analyses)) < 2)
stop("At least two imputations are needed for pooling.\n")
analyses <- object$analyses
k <- length(coef(analyses[[1]]))
names <- names(coef(analyses[[1]]))
qhat <- matrix(NA, nrow = m, ncol = k, dimnames = list(1:m, names))
u <- array(NA, dim = c(m, k, k),
dimnames = list(1:m, names, names))
for (i in 1:m) {
fit <- analyses[[i]]
qhat[i, ] <- coef(fit)
u[i, , ] <- vcov(fit)
}
qbar <- apply(qhat, 2, mean)
ubar <- apply(u, c(2, 3), mean)
e <- qhat - matrix(qbar, nrow = m, ncol = k, byrow = TRUE)
b <- (t(e) %*% e)/(m - 1)
t <- ubar + (1 + 1/m) * b
r <- (1 + 1/m) * diag(b/ubar)
f <- (1 + 1/m) * diag(b/t)
df <- (m - 1) * (1 + 1/r)2
if (method == "smallsample") {
if( any( class(fit) == "coxph" ) ){
### this loop is the hack for survival analysis ###
status <- fit$y[ , 2]
n.events <- sum(status == max(status))
p <- length( coefficients( fit ) )
dfc <- n.events - p
} else {
dfc <- fit$df.residual
}
df <- dfc/((1 - (f/(m + 1)))/(1 - f) + dfc/df)
}
names(r) <- names(df) <- names(f) <- names
fit <- list(call = call, call1 = object$call, call2 = object$call1,
nmis = object$nmis, m = m, qhat = qhat, u = u,
qbar = qbar, ubar = ubar, b = b, t = t, r = r, df = df,
f = f)
oldClass(fit) <- if (.SV4.) "mipo" else c("mipo", oldClass(object))
return(fit)
}
print.miro only gives the coefficients. Often I need the standard errors
as well since I want to test if each regression coefficient from
multiple imputation is zero or not. Since the function summary.mipo does
not exist, can I suggest the following
summary.mipo <- function(object){
if (!is.null(object$call1)){
cat("Call: ")
dput(object$call1)
}
est <- object$qbar
se <- sqrt(diag(object$t))
tval <- est/se
df <- object$df
pval <- 2 * pt(abs(tval), df, lower.tail = FALSE)
coefmat <- cbind(est, se, tval, pval)
colnames(coefmat) <- c("Estimate", "Std. Error",
"t value", "Pr(>|t|)")
cat("\nCoefficients:\n")
printCoefmat( coefmat, P.values=T, has.Pvalue=T, signif.legend=T )
cat("\nFraction of information about the coefficients
missing due to nonresponse:", "\n")
print(object$f)
ans <- list( coefficients=coefmat, df=df,
call=object$call1, fracinfo.miss=object$f )
invisible( ans )
}
Hope this helps.
Regards, Adai
Inman, Brant A. M.D. wrote:
R-helpers: I have a dataset that has 168 subjects and 12 variables. Some of the variables have missing data and I want to use the multiple imputation capabilities of the "mice" package to address the missing data. Given that mice only supports linear models and generalized linear models (via the lm.mids and glm.mids functions) and that I need to fit Cox models, I followed the previous suggestion of John Fox and have created my own function "cox.mids" to use coxph to fit models to the imputed datasets. (http://tolstoy.newcastle.edu.au/R/help/06/03/22295.html) The function I created is: ------------ cox.mids <- function (formula, data, ...) { call <- match.call() if (!is.mids(data)) stop("The data must have class mids") analyses <- as.list(1:data$m) for (i in 1:data$m) { data.i <- complete(data, i) analyses[[i]] <- coxph(formula, data = data.i, ...) } object <- list(call = call, call1 = data$call, nmis = data$nmis, analyses = analyses) oldClass(object) <- c("mira", "coxph") return(object) } ------------ The problem that I encounter occurs when I try to use the "pool" function to pool the fitted models into one general model. Here is some code that reproduces the error using the pbc dataset. ------------ d <- pbc[,c('time','status','age','sex','hepmeg','platelet', 'trt', 'trig')] d[d==-9] <- NA d[,c(4,5,7)] <- lapply(d[,c(4,5,7)], FUN=as.factor) str(d) imp <- mice(d, m=10, maxit=10, diagnostics=T, seed=500, defaultImputationMethod=c('norm', 'logreg', 'polyreg')) fit <- cox.mids(Surv(time,status) ~ age + sex + hepmeg + platelet + trt + trig, imp) pool(fit) ------------ I have looked at the "pool" function but cannot figure out what I have done wrong. Would really appreciate any help with this. Thanks, Brant Inman Mayo Clinic
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.