Skip to content

survSplit: further exploration and related topics

3 messages · BXC (Bendix Carstensen), Thomas Lumley, Danardono

#
To Danardonos concern of splitting time for records with delayed entry:

This can fairly easily be accomodated, by simply splitting time in small
intervals of time since entry into the study, and then compute the value
of the other timescales for each of these e.g.:

current.age <- time.from.entry + age.at.entry

but the cut on the other timescales will not be exactly where you may
want 
them to be, but this should be of minor importance in real life.

This approach will also make it clearer what really is going on.

The effect of each of the timescales can then be modelled using the
usual
regression tools available in R.

David Clayton har written an R-function that does it correctly, you can
find it in:
http://biostat.ku.dk/~bxc/Lexis/ along with its .Rd file.

It is also included a Lexis-package which is a first shot at an
epidemiology package 
for R available at http://biostat.ku.dk/~bxc/SPE/library/, but built
under 1.9.
I recently tried to build it under 2.0, but it crashed for me and I was
advised to
wait till 2.1 before I had another go at it.

A little further exploration of what goes on in survSplit gives:
+                     event="status",episode="i")
id expand start time status          x
11  k     30     0    5      0 Maintained
34  k     31     5   10      0 Maintained
57  k     32    10   50      0 Maintained
80  k     33    50  161      0 Maintained
+                      event="status",episode="i",id="id2")
aml4[aml4$id=="k",c("id","expand","start","entry","time","status","x")]
    id expand start entry time status          x
30   k     30     0     0    5      0 Maintained
31   k     31     5     0    9      0 Maintained
94   k     31     5     9   10      0 Maintained
32   k     32    10     0    9      0 Maintained
95   k     32    10     9   12      0 Maintained
158  k     32    10    12   40      0 Maintained
221  k     32    10    40   50      0 Maintained
33   k     33    50     0    9      0 Maintained
96   k     33    50     9   12      0 Maintained
159  k     33    50    12   40      0 Maintained
222  k     33    50    40  161      0 Maintained
+                      event="status",episode="i",id="id2")
id expand start time status          x
30   k     30     0    5      0 Maintained
31   k     31     5    9      0 Maintained
94   k     31     9   10      0 Maintained
95   k     32     9   12      0 Maintained
158  k     32    12   40      0 Maintained
221  k     32    40   50      0 Maintained
96   k     33     9   12      0 Maintained
159  k     33    12   40      0 Maintained
222  k     33    40  161      0 Maintained
It appears that the intention has been to support counting process
input, 
but not quite succeeded.

Bendix Carstensen
----------------------
Bendix Carstensen
Senior Statistician
Steno Diabetes Center
Niels Steensens Vej 2
DK-2820 Gentofte
Denmark
tel: +45 44 43 87 38
mob: +45 30 75 87 38
fax: +45 44 43 07 06
bxc at steno.dk
www.biostat.ku.dk/~bxc
----------------------
#
On Tue, 9 Nov 2004, BXC (Bendix Carstensen) wrote:

            
There was actually no intention of supporting counting-process input, but 
it turns out to be a one-line change: from
        starttime <- c(starttime, rep(cut, each = n))
to
        starttime<-c(starttime, pmax(starttime, rep(cut,each=n)))

about 20 lines down from the top.  This will be in the next release of 
survival, but not until after R 2.0.1


 	-thomas
1 day later
#
While waiting for R.2.0.1 or 2.1, for you  who need function for this 
survival-splitting business, as I do,   this 'survcut' function below 
might be helpful.
It is not an elegant nor efficient function but it works, at least for 
some examples below.

data(aml)
m1<-coxph(Surv(time,status)~x,data=aml)

#unfortunately, start time must be created first
#but not a big deal I believe
aml$t0<-0

#then try :

d1<-survcut(cut=c(5,10,50),c("t0","time","status"),data=aml)
coxph(Surv(t0,time,status)~x,data=d1)
d2<-survcut(cut=c(9,12,40),c("t0","time","status"),data=d1)
coxph(Surv(t0,time,status)~x,data=d2)
d3<-survcut(cut=c(9,12,40),c("t0","time","status"),data=d2)
coxph(Surv(t0,time,status)~x,data=d3)

# splitting at the risk times
# useful for coxph with time-dependent covariate
d4<-survcut(cut=unique(aml$time[aml$status==1]),c("t0","time","status"),data=d2)
coxph(Surv(t0,time,status)~x,data=d4)

# "random" splitting
dr<-survcut(cut=runif(rpois(1,4),0,100),c("t0","time","status"),data=d1)
dim(dr)
coxph(Surv(t0,time,status)~x,data=dr)


# "per unit time" splitting
d5<-survcut(cut=0:161,c("t0","time","status"),data=d4)
coxph(Surv(t0,time,status)~x,data=d5)


#### the code --------------------------------------------------
survcut<-function (cuts, surv = c("t0", "t1", "event"), data,
   id = NULL, addv = TRUE, sq = FALSE)
# cuts: vector of timepoints to cut at
# surv: a Surv like input
# id :  (optional) variable name, if the data has an id variable
#advv :  (optional) include other variables in the output?
#sq: (optional) sequence of splitting , may be similar with episode in 
survSplit
{
   #cutting one counting process survival line
    cutting <- function(x, a, s) {
        x <- as.numeric(x)
        tmp <- sort(c(x[2:3], a[(x[2] < a) & (x[3] > a)]))
        n <- length(tmp)
        idx <- rep(x[1], n - 1)
        t0 <- tmp[1:(n - 1)]
        t1 <- tmp[2:n]
        event <- rep(0, n - 1)
        event[n - 1] <- x[4]
        if (s)
            data.frame(idx, t0, t1, event, s = 1:(n - 1))
        else data.frame(idx, t0, t1, event)
    }

  #Note that this database-joint-like function is similar to 'merge'
  #I made this before I knew 'merge' is available in R
  #merge can substitute this function, I believe
  #x: the main data (all rows will be selected)
  #variable: variable name (character vector) from refdat
  #key: key variable has to be exist in x and refdat
  #refdat: reference data, must have unique 'key' id
  addvars<-function (x, variable, key = "id", refdat)
  {
    nama <- names(refdat)
    if (is.numeric(variable)) {
        if (!prod(variable %in% 1:length(nama)))
            stop("variable does not exist")
        newname <- names(refdat)[variable]
    }
    else {
        if (!prod(variable %in% nama))
            stop("variable does not exist")
        newname <- variable
    }
    if (!(length(unique(refdat[, key])) == length(refdat[, key])))
        stop("key must be unique")
    newvar <- refdat[, variable, drop = FALSE]
    newvar <- newvar[match(x[, key], refdat[, key]), , drop = FALSE]
    x <- cbind(x, newvar)
    return(x)
}

    vars <- names(data)
    if (sum(surv %in% vars) != 3)
        stop("one or more surv variables do not exist")
    idx <- 1:NROW(data)
    data<- cbind(idx, data)
    t0 <- data[, surv[1]]
    t1 <- data[, surv[2]]
    event <- data[, surv[3]]
    x <- data.frame(idx, t0, t1, event)
    out <- by(x, list(1:NROW(x)), cutting, a = cuts, s = sq)
    out <- do.call("rbind", out)
    rownames(out) <- 1:NROW(out)
    names(out)[2:4] <- surv
    if (addv)
        out <- addvars(out, setdiff(names(data), c(surv, "idx")),
            key = "idx", refdat = data)
    out <- out[, -1]
    if (!is.null(id)) {
        slct <- c(id, setdiff(names(out), id))
        out <- subset(out, select = slct)
    }
    out
}

-----------------------------
/Danar