Skip to content
Prev 307149 / 398506 Next

Expected number of events, Andersen-Gill model fit via coxph in package survival

Hello,

I am interested in producing the expected number of events, in a
recurring events setting. I am using the Andersen-Gill model, as fit
by the function "coxph" in the package "survival."

I need to produce expected numbers of events for a cohort,
cumulatively, at several fixed times. My ultimate goal is: To fit an
AG model to a reference sample, then use that fitted model to generate
expected numbers of events for a new cohort; then, comparing the
expected vs. the observed numbers of events would give us some idea of
whether the new cohort differs from the reference one.
Grambsch, it seems that the function "survexp" is what I need. But
using it I am not able to obtain expected numbers of events that match
reasonably well the observed numbers *even for the same reference
population.* So, I think I am misunderstanding something quite badly.

Below is an example that illustrates the situation. At the end I
include the sessionInfo().

Thank you!

Omar.


############################

# Example of unexpected behavior in computing estimated number of events
# in using package "survival" for fitting the Andersen-Gill model

require(survival)

head(bladder2)  # this is the data, in interval format

# Fit Andersen-Gill model
cphfit = coxph(Surv(start,stop,event)~rx+number+size+cluster(id),data=bladder2)

# Choose some arbitrary time horizons
t.horiz = seq(min(bladder2$start),max(bladder2$stop),length=6)

# Compute the cohort expected survival
s = survexp(~1,data=bladder2,ratetable=cphfit,times=t.horiz)

# This are the expected survival values:
s$surv

# We are interested in the rate of events
e.r = as.vector( 1 - s$surv )

# How does this compare to the actual number of events, cumulative at
# each time horizon?

observed = numeric(length(t.horiz))

for (i in 1:length(t.horiz)){

        observed[i] = sum(bladder2$event[bladder2$stop <= t.horiz[i]])

}

print(observed)

# We would like to compute expected numbers of events that approximately
# match these observed values.

# We should multiply the expected survival rates by the number of individuals.

# Now, one would think that this is the number of at-risk individuals:
s$n.risk

# But that is actually the total number of rows in the data. In any case,
# these numbers do not match:

rbind(expected = s$n.risk*e.r,observed=observed)

# What if we multiply by the number of individuals?

rbind(expected = length(unique(bladder2$id))*e.r,observed=observed)

# This does not work either! The required factor seems to be about 133, but
# I don't see an explanation for that.

# In this example, multiplying by 133.182 gives a good match between observed
# and expected values, but in other examples even the shape of the curves
# are different.

# Multiplying by a number of individuals at risk at each time point
# (number of individuals
# for which there is a time interval containing the time horizon) does
# not work either.

#####################
R version 2.15.1 (2012-06-22)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets
methods   base

other attached packages:
[1] survival_2.36-14

loaded via a namespace (and not attached):
[1] tools_2.15.1