Skip to content

Time-dependent coefficients in a Cox model with categorical variants

2 messages · Terry Therneau, Jeff Newmiller

#
First, as others have said please obey the mailing list rules and turn of
First, as others have said please obey the mailing list rules and turn off html, not everyone uses an html email client.

Here is your code, formatted and with line numbers added.  I also fixed one error: "y" should be "status".

1. fit0 <- coxph(Surv(futime, status) ~ x1 + x2 + x3, data = data0)
2. p <- log(predict(fit0, newdata = data1, type = "expected"))
3. lp <- predict(fit0, newdata = data1, type = "lp")
4. logbase <- p - lp
5. fit1 <- glm(status ~ offset(p), family = poisson, data = data1)
6. fit2 <- glm(status~ lp + offset(logbase), family = poisson, data = data1)
7. group <- cut(lp, c(-Inf, quantile(lp, (1:9) / 10), Inf))
8. fit3 <- glm(status ~ -1 + group + offset(p), family = poisson, data = data1)

The key idea of the paper you referenced is that the counterpart to the Hosmer-Lemishow test (wrong if used directly in a Cox model) is to look at the predicted values from a Cox model as input to a Poisson regression.  That means adding the expected from the Cox model as a fixed term in the Poisson.  And like any other poisson that means offset(log(expected)) as a term.

The presence of time dependent covariates does nothing to change this, per se, since expected for time fixed is the same as for time varying.  In practice it does matter, at least philosophically.  Lines 1, 2, 5 do this just fine.

If data1 is not the same as data0, a new study say, then the test for intercept=0 from fit1 is a test of overall calibration.  Models like line 8 try to partition out where any differences actually lie.

The time-dependent covariates part lies in the fact that a single subject may be represented by multiple lines in data0 and/or data1.  Do you want to collapse that person into a single row before the glm fits?  If subject "Jones" is represented by 15 lines in the data and "Smith" by 2, it does seem a bit unfair to give Jones 15 observations in the glm fit.  But full discussion of this is as much philosophy as statistics, and is perhaps best done over a beer.

Terry T.
#
Offlist... for your information...

It is unfair to suggest that the mailing list participants are at fault for using old software.  Even if the mailing list participants use email programs that can handle HTML, any email that goes through the list gets the formatting stripped, which leaves it damaged to some degree. It might not seem like this because sometimes you CAN see formatting, but that only happens when you are listed in the "To" or "Cc" fields... the rest of the list saw a stripped version regardless of how good their mail program was. Just go look at the archives to confirm this. Net result is the rest of the participants see a more or less damaged version of the discussion/code whenever HTML is used on list.