survSplit: further exploration and related topics
To Danardonos concern of splitting time for records with delayed entry: This can fairly easily be accomodated, by simply splitting time in small intervals of time since entry into the study, and then compute the value of the other timescales for each of these e.g.: current.age <- time.from.entry + age.at.entry but the cut on the other timescales will not be exactly where you may want them to be, but this should be of minor importance in real life. This approach will also make it clearer what really is going on. The effect of each of the timescales can then be modelled using the usual regression tools available in R. David Clayton har written an R-function that does it correctly, you can find it in: http://biostat.ku.dk/~bxc/Lexis/ along with its .Rd file. It is also included a Lexis-package which is a first shot at an epidemiology package for R available at http://biostat.ku.dk/~bxc/SPE/library/, but built under 1.9. I recently tried to build it under 2.0, but it crashed for me and I was advised to wait till 2.1 before I had another go at it. A little further exploration of what goes on in survSplit gives:
data(aml) aml$id <- letters[1:dim(aml)[1]] aml3<-survSplit(aml,cut=c(5,10,50),end="time",start="start",
+ event="status",episode="i")
# Sort the rows sensibly
aml3 <- aml3[order(aml3$id,aml3$start),]
aml3$expand <- 1:(dim(aml3)[1])
aml3[aml3$id=="k",c("id","expand","start","time","status","x")]
id expand start time status x 11 k 30 0 5 0 Maintained 34 k 31 5 10 0 Maintained 57 k 32 10 50 0 Maintained 80 k 33 50 161 0 Maintained
# All records are taken as if entry were at time 0: aml4<-survSplit(aml3,cut=c(9,12,40),end="time",start="entry",
+ event="status",episode="i",id="id2")
aml4 <- aml4[order(aml4$id,aml4$expand,aml4$start),]
aml4[aml4$id=="k",c("id","expand","start","entry","time","status","x")]
id expand start entry time status x
30 k 30 0 0 5 0 Maintained
31 k 31 5 0 9 0 Maintained
94 k 31 5 9 10 0 Maintained
32 k 32 10 0 9 0 Maintained
95 k 32 10 9 12 0 Maintained
158 k 32 10 12 40 0 Maintained
221 k 32 10 40 50 0 Maintained
33 k 33 50 0 9 0 Maintained
96 k 33 50 9 12 0 Maintained
159 k 33 50 12 40 0 Maintained
222 k 33 50 40 161 0 Maintained
# entry time is is "start", but it seems that it is more or less # assumed that entry is at the first split point given. aml5<-survSplit(aml3,cut=c(9,12,40),end="time",start="start",
+ event="status",episode="i",id="id2")
aml5 <- aml5[order(aml5$id,aml5$expand,aml5$start),]
aml5[aml5$id=="k",c("id","expand","start","time","status","x")]
id expand start time status x 30 k 30 0 5 0 Maintained 31 k 31 5 9 0 Maintained 94 k 31 9 10 0 Maintained 95 k 32 9 12 0 Maintained 158 k 32 12 40 0 Maintained 221 k 32 40 50 0 Maintained 96 k 33 9 12 0 Maintained 159 k 33 12 40 0 Maintained 222 k 33 40 161 0 Maintained
It appears that the intention has been to support counting process input, but not quite succeeded. Bendix Carstensen ---------------------- Bendix Carstensen Senior Statistician Steno Diabetes Center Niels Steensens Vej 2 DK-2820 Gentofte Denmark tel: +45 44 43 87 38 mob: +45 30 75 87 38 fax: +45 44 43 07 06 bxc at steno.dk www.biostat.ku.dk/~bxc ----------------------
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Thomas Lumley Sent: Monday, November 08, 2004 4:28 PM To: Danardono Cc: R-help at stat.math.ethz.ch Subject: Re: [R] survSplit On Mon, 8 Nov 2004, Danardono wrote:
I am just realized that survival has the facility to do
survival time
splitting survSplit after read some postings about time dependency in the list. Is it survSplit only for the survival data input
(time,status) and not for
the 'counting process' input (start,stop,status)? I take one example modified from the survSplit help:
data(aml)
aml3<-survSplit(aml,cut=c(5,10,50),end="time",start="start",event="st
atus",episode="i",id="id")
coxph(Surv(time,status)~x,data=aml)
coef exp(coef) se(coef) z p
xNonmaintained 0.916 2.5 0.512 1.79 0.074
Likelihood ratio test=3.38 on 1 df, p=0.0658 n= 23
#the same
coxph(Surv(time,status)~x,data=aml3)
coef exp(coef) se(coef) z p
xNonmaintained 0.916 2.5 0.512 1.79 0.074
Likelihood ratio test=3.38 on 1 df, p=0.0658 n= 63
This should NOT be the same, and on my computer is not. You should have to use the counting-process syntax. I get
coxph(Surv(time,status)~x,data=aml3)
Call:
coxph(formula = Surv(time, status) ~ x, data = aml3)
coef exp(coef) se(coef) z p
xNonmaintained 1.07 2.92 0.514 2.08 0.037
Likelihood ratio test=4.61 on 1 df, p=0.0317 n= 63
coxph(Surv(start,time,status)~x,data=aml3)
Call:
coxph(formula = Surv(start, time, status) ~ x, data = aml3)
coef exp(coef) se(coef) z p
xNonmaintained 0.916 2.5 0.512 1.79 0.074
Likelihood ratio test=3.38 on 1 df, p=0.0658 n= 63
BUT If I split aml3 further:
aml4<-survSplit(aml3,cut=c(9,12,40),end="time",start="start",event="s
tatus",episode="i",id="id2")
#not the same!
coxph(Surv(start,time,status)~x,data=aml4)
coef exp(coef) se(coef) z p xNonmaintained 1.05 2.85 0.515 2.03 0.042 Likelihood ratio test=4.38 on 1 df, p=0.0363 n= 105
Hmm. I suspect this is a <= vs < bug of some sort in handling start times. -thomas
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html