Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time... I don't find a solution to use the "survfit" function (package: survival) for a defined pattern of covariates with a Cox-Model including a time dependent interaction term. Somehow the definition of my "newdata" argument seems to be erroneous. I already googled the problem, found many persons having the same or a similar problem, but still no solution. I want to stress that my time-dependent covariate does not depend on the failure of an individual (in this case it doesn't seem sensible to predict a survivor function for an individual). Rather one of my effects declines with time (time-dependent coefficient). For illustration, I use the example of John Fox's paper "Cox Proportional - Hazards Regression for Survival Data". http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf Do you know any help? See code below Thanks very much in advance J?rgen Biedermann #---------------------------------------- #Code Rossi <- read.table("http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt", header=T) Rossi.2 <- fold(Rossi, time='week', event='arrest', cov=11:62, cov.names='employed') # see below for the fold function from John Fox # modeling an interaction with time (Page 14) mod.allison.5 <- coxph(Surv(start, stop, arrest.time) ~ fin + age + age:stop + prio, data=Rossi.2) mod.allison.5 # Attempt to get the survivor function of a person with age=30, fin=0 and prio=5 newdata.1 <- data.frame(unique(Rossi.2[c("start","stop")]),fin=0,age=30,prio=5,Id=1,arrest.time=0) fit <- survfit(mod.allison.5,newdata.1,id="Id") Error message: >Fehler in model.frame.default(data = newdata.1, id = "Id", formula = Surv(start, : Variablenl?ngen sind unterschiedlich (gefunden f?r '(id)') --> failure, length of variables are different. #----------------------------------------------------------------- fold <- function(data, time, event, cov, cov.names=paste('covariate', '.', 1:ncovs, sep=""), suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){ vlag <- function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)]) xlag <- function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag) all.cov <- unlist(cov) if (!is.list(cov)) cov <- list(cov) ncovs <- length(cov) nrow <- nrow(data) ncol <- ncol(data) ncov <- length(cov[[1]]) nobs <- nrow*ncov if (length(unique(c(sapply(cov, length), length(cov.times)-1))) > 1) stop(paste( "all elements of cov must be of the same length and \n", "cov.times must have one more entry than each element of cov.")) var.names <- names(data) subjects <- rownames(data) omit.cols <- if (!common.times) c(all.cov, cov.times) else all.cov keep.cols <- (1:ncol)[-omit.cols] nkeep <- length(keep.cols) if (is.numeric(event)) event <- var.names[event] times <- if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T) else data[, cov.times] new.data <- matrix(Inf, nobs, 3 + ncovs + nkeep) rownames <- rep("", nobs) colnames(new.data) <- c('start', 'stop', paste(event, suffix, sep=""), var.names[-omit.cols], cov.names) end.row <- 0 for (i in 1:nrow){ start.row <- end.row + 1 end.row <- end.row + ncov start <- times[i, 1:ncov] stop <- times[i, 2:(ncov+1)] event.time <- ifelse (stop == data[i, time] & data[i, event] == 1, 1, 0) keep <- matrix(unlist(data[i, -omit.cols]), ncov, nkeep, byrow=T) select <- apply(matrix(!is.na(data[i, all.cov]), ncol=ncovs), 1, all) rows <- start.row:end.row cov.mat <- xlag(matrix(unlist(data[i, all.cov]), nrow=length(rows)), lag) new.data[rows[select], ] <- cbind(start, stop, event.time, keep, cov.mat)[select,] rownames[rows] <- paste(subjects[i], '.', seq(along=rows), sep="") } row.names(new.data) <- rownames as.data.frame(new.data[new.data[, 1] != Inf & apply(as.matrix(!is.na(new.data[, cov.names])), 1, all), ]) } #-----------------------------------------------------------------
How to specify "newdata" in a Cox-Modell with a time dependent interaction term?
7 messages · John Fox, Jürgen Biedermann, Terry Therneau
Dear J?rgen, All the values of your Id variable are equal to 1; how can that make sense? Removing the argument Id=id may get you what you want. By the way, the document to which you refer was an appendix to a 2002 book that has been superseded by Fox and Weisberg, An R Companion to Applied Regression, Second Edition (Sage, 2011). Appendices for that book are available at <http://socserv.socsci.mcmaster.ca/jfox/Books/Companion/appendix.html>. I hope this helps, John ------------------------------------------------ John Fox Sen. William McMaster Prof. of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ On Sat, 16 Jun 2012 15:04:48 +0200
J?rgen Biedermann <juergen.biedermann at googlemail.com> wrote:
Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time... I don't find a solution to use the "survfit" function (package: survival) for a defined pattern of covariates with a Cox-Model including a time dependent interaction term. Somehow the definition of my "newdata" argument seems to be erroneous. I already googled the problem, found many persons having the same or a similar problem, but still no solution. I want to stress that my time-dependent covariate does not depend on the failure of an individual (in this case it doesn't seem sensible to predict a survivor function for an individual). Rather one of my effects declines with time (time-dependent coefficient). For illustration, I use the example of John Fox's paper "Cox Proportional - Hazards Regression for Survival Data". http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf Do you know any help? See code below Thanks very much in advance J?rgen Biedermann #---------------------------------------- #Code Rossi <- read.table("http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt", header=T) Rossi.2 <- fold(Rossi, time='week', event='arrest', cov=11:62, cov.names='employed') # see below for the fold function from John Fox # modeling an interaction with time (Page 14) mod.allison.5 <- coxph(Surv(start, stop, arrest.time) ~ fin + age + age:stop + prio, data=Rossi.2) mod.allison.5 # Attempt to get the survivor function of a person with age=30, fin=0 and prio=5 newdata.1 <- data.frame(unique(Rossi.2[c("start","stop")]),fin=0,age=30,prio=5,Id=1,arrest.time=0) fit <- survfit(mod.allison.5,newdata.1,id="Id") Error message:
>Fehler in model.frame.default(data = newdata.1, id = "Id", formula =
Surv(start, :
Variablenl?ngen sind unterschiedlich (gefunden f?r '(id)')
--> failure, length of variables are different.
#-----------------------------------------------------------------
fold <- function(data, time, event, cov,
cov.names=paste('covariate', '.', 1:ncovs, sep=""),
suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){
vlag <- function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)])
xlag <- function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag)
all.cov <- unlist(cov)
if (!is.list(cov)) cov <- list(cov)
ncovs <- length(cov)
nrow <- nrow(data)
ncol <- ncol(data)
ncov <- length(cov[[1]])
nobs <- nrow*ncov
if (length(unique(c(sapply(cov, length), length(cov.times)-1))) > 1)
stop(paste(
"all elements of cov must be of the same length and \n",
"cov.times must have one more entry than each element of
cov."))
var.names <- names(data)
subjects <- rownames(data)
omit.cols <- if (!common.times) c(all.cov, cov.times) else all.cov
keep.cols <- (1:ncol)[-omit.cols]
nkeep <- length(keep.cols)
if (is.numeric(event)) event <- var.names[event]
times <- if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T)
else data[, cov.times]
new.data <- matrix(Inf, nobs, 3 + ncovs + nkeep)
rownames <- rep("", nobs)
colnames(new.data) <- c('start', 'stop', paste(event, suffix, sep=""),
var.names[-omit.cols], cov.names)
end.row <- 0
for (i in 1:nrow){
start.row <- end.row + 1
end.row <- end.row + ncov
start <- times[i, 1:ncov]
stop <- times[i, 2:(ncov+1)]
event.time <- ifelse (stop == data[i, time] & data[i, event] ==
1, 1, 0)
keep <- matrix(unlist(data[i, -omit.cols]), ncov, nkeep, byrow=T)
select <- apply(matrix(!is.na(data[i, all.cov]), ncol=ncovs),
1, all)
rows <- start.row:end.row
cov.mat <- xlag(matrix(unlist(data[i, all.cov]),
nrow=length(rows)), lag)
new.data[rows[select], ] <-
cbind(start, stop, event.time, keep, cov.mat)[select,]
rownames[rows] <- paste(subjects[i], '.', seq(along=rows), sep="")
}
row.names(new.data) <- rownames
as.data.frame(new.data[new.data[, 1] != Inf &
apply(as.matrix(!is.na(new.data[, cov.names])), 1, all), ])
}
#-----------------------------------------------------------------
Dear John, Thank you very much for the quick answer. I will use the mentioned citation of your book in the future. However: The Id argument is necessary for the function to know that the different rows belong to one individual. Or how the Help-File says: id: optional variable name of subject identifiers. If this is present, then each group of rows with the same subject id represents the covariate path through time of a single subject, and the result will contain one curve per subject. If the coxph fit had strata then that must also be specified in newdata. If missing, then each individual row of newdata is presumed to represent a distinct subject and there will be nrow(newdata) times the number of strata curves in the result (one for each strata/subject combination). I tried your proposal anyway, but, as expected, fifty different survival curves appeared. So, there is still no solution. Best regards J?rgen
----------------------------------- J?rgen Biedermann Bergmannstra?e 3 10961 Berlin-Kreuzberg Mobil: +49 176 247 54 354 Home: +49 30 940 53 044 e-mail: juergen.biedermann at gmail.com --------- Korrespondenz ---------- Betreff: Re: How to specify "newdata" in a Cox-Modell with a time dependent interaction term? Von: John Fox <jfox at mcmaster.ca> An: J?rgen Biedermann <juergen.biedermann at googlemail.com> Datum: 16.06.2012 15:55 > Dear J?rgen, > > All the values of your Id variable are equal to 1; how can that make sense? Removing the argument Id=id may get you what you want. > > By the way, the document to which you refer was an appendix to a 2002 book that has been superseded by Fox and Weisberg, An R Companion to Applied Regression, Second Edition (Sage, 2011). Appendices for that book are available at<http://socserv.socsci.mcmaster.ca/jfox/Books/Companion/appendix.html>. > > I hope this helps, > John > > ------------------------------------------------ > John Fox > Sen. William McMaster Prof. of Social Statistics > Department of Sociology > McMaster University > Hamilton, Ontario, Canada > http://socserv.mcmaster.ca/jfox/ > > On Sat, 16 Jun 2012 15:04:48 +0200 > J?rgen Biedermann<juergen.biedermann at googlemail.com> wrote: >> Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time... >> >> I don't find a solution to use the "survfit" function (package: survival) for a defined pattern of covariates with a Cox-Model including a time dependent interaction term. Somehow the definition of my "newdata" argument seems to be erroneous. >> I already googled the problem, found many persons having the same or a similar problem, but still no solution. >> I want to stress that my time-dependent covariate does not depend on the failure of an individual (in this case it doesn't seem sensible to predict a survivor function for an individual). Rather one of my effects declines with time (time-dependent coefficient). >> >> For illustration, I use the example of John Fox's paper "Cox Proportional - Hazards Regression for Survival Data". >> http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf >> >> Do you know any help? See code below >> >> Thanks very much in advance >> J?rgen Biedermann >> >> #---------------------------------------- >> #Code >> >> Rossi<- read.table("http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt", header=T) >> >> Rossi.2<- fold(Rossi, time='week', >> event='arrest', cov=11:62, cov.names='employed') >> >> # see below for the fold function from John Fox >> >> # modeling an interaction with time (Page 14) >> >> mod.allison.5<- coxph(Surv(start, stop, arrest.time) ~ >> fin + age + age:stop + prio, >> data=Rossi.2) >> mod.allison.5 >> >> # Attempt to get the survivor function of a person with age=30, fin=0 and prio=5 >> >> newdata.1<- data.frame(unique(Rossi.2[c("start","stop")]),fin=0,age=30,prio=5,Id=1,arrest.time=0) >> fit<- survfit(mod.allison.5,newdata.1,id="Id") >> >> Error message: >> >> >Fehler in model.frame.default(data = newdata.1, id = "Id", formula = >> Surv(start, : >> Variablenl?ngen sind unterschiedlich (gefunden f?r '(id)') >> >> --> failure, length of variables are different. >> >> #----------------------------------------------------------------- >> fold<- function(data, time, event, cov, >> cov.names=paste('covariate', '.', 1:ncovs, sep=""), >> suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){ >> vlag<- function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)]) >> xlag<- function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag) >> all.cov<- unlist(cov) >> if (!is.list(cov)) cov<- list(cov) >> ncovs<- length(cov) >> nrow<- nrow(data) >> ncol<- ncol(data) >> ncov<- length(cov[[1]]) >> nobs<- nrow*ncov >> if (length(unique(c(sapply(cov, length), length(cov.times)-1)))> 1) >> stop(paste( >> "all elements of cov must be of the same length and \n", >> "cov.times must have one more entry than each element of >> cov.")) >> var.names<- names(data) >> subjects<- rownames(data) >> omit.cols<- if (!common.times) c(all.cov, cov.times) else all.cov >> keep.cols<- (1:ncol)[-omit.cols] >> nkeep<- length(keep.cols) >> if (is.numeric(event)) event<- var.names[event] >> times<- if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T) >> else data[, cov.times] >> new.data<- matrix(Inf, nobs, 3 + ncovs + nkeep) >> rownames<- rep("", nobs) >> colnames(new.data)<- c('start', 'stop', paste(event, suffix, sep=""), >> var.names[-omit.cols], cov.names) >> end.row<- 0 >> for (i in 1:nrow){ >> start.row<- end.row + 1 >> end.row<- end.row + ncov >> start<- times[i, 1:ncov] >> stop<- times[i, 2:(ncov+1)] >> event.time<- ifelse (stop == data[i, time]& data[i, event] == >> 1, 1, 0) >> keep<- matrix(unlist(data[i, -omit.cols]), ncov, nkeep, byrow=T) >> select<- apply(matrix(!is.na(data[i, all.cov]), ncol=ncovs), >> 1, all) >> rows<- start.row:end.row >> cov.mat<- xlag(matrix(unlist(data[i, all.cov]), >> nrow=length(rows)), lag) >> new.data[rows[select], ]<- >> cbind(start, stop, event.time, keep, cov.mat)[select,] >> rownames[rows]<- paste(subjects[i], '.', seq(along=rows), sep="") >> } >> row.names(new.data)<- rownames >> as.data.frame(new.data[new.data[, 1] != Inf& >> apply(as.matrix(!is.na(new.data[, cov.names])), 1, all), ]) >> } >> #----------------------------------------------------------------- >>
Dear Jurgen,
fit <- survfit(mod.allison.5, newdata.1, individual=TRUE) fit
Call: survfit(formula = mod.allison.5, newdata = newdata.1, individual = TRUE) records n.max n.start events median 0.95LCL 0.95UCL 19809 432 432 114 NA NA NA plot(fit) # shows one survival curve See ?survfit.coxph. Best, John On Sat, 16 Jun 2012 16:23:41 +0200
J?rgen Biedermann <juergen.biedermann at googlemail.com> wrote:
Dear John, Thank you very much for the quick answer. I will use the mentioned citation of your book in the future. However: The Id argument is necessary for the function to know that the different rows belong to one individual. Or how the Help-File says: id: optional variable name of subject identifiers. If this is present, then each group of rows with the same subject id represents the covariate path through time of a single subject, and the result will contain one curve per subject. If the coxph fit had strata then that must also be specified in newdata. If missing, then each individual row of newdata is presumed to represent a distinct subject and there will be nrow(newdata) times the number of strata curves in the result (one for each strata/subject combination). I tried your proposal anyway, but, as expected, fifty different survival curves appeared. So, there is still no solution. Best regards J?rgen -- ----------------------------------- J?rgen Biedermann Bergmannstra?e 3 10961 Berlin-Kreuzberg Mobil: +49 176 247 54 354 Home: +49 30 940 53 044 e-mail: juergen.biedermann at gmail.com --------- Korrespondenz ---------- Betreff: Re: How to specify "newdata" in a Cox-Modell with a time dependent interaction term? Von: John Fox <jfox at mcmaster.ca> An: J?rgen Biedermann <juergen.biedermann at googlemail.com> Datum: 16.06.2012 15:55
Dear J?rgen, All the values of your Id variable are equal to 1; how can that make sense? Removing the argument Id=id may get you what you want. By the way, the document to which you refer was an appendix to a 2002 book that has been superseded by Fox and Weisberg, An R Companion to Applied Regression, Second Edition (Sage, 2011). Appendices for that book are available at<http://socserv.socsci.mcmaster.ca/jfox/Books/Companion/appendix.html>. I hope this helps, John ------------------------------------------------ John Fox Sen. William McMaster Prof. of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ On Sat, 16 Jun 2012 15:04:48 +0200 J?rgen Biedermann<juergen.biedermann at googlemail.com> wrote:
Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time... I don't find a solution to use the "survfit" function (package: survival) for a defined pattern of covariates with a Cox-Model including a time dependent interaction term. Somehow the definition of my "newdata" argument seems to be erroneous. I already googled the problem, found many persons having the same or a similar problem, but still no solution. I want to stress that my time-dependent covariate does not depend on the failure of an individual (in this case it doesn't seem sensible to predict a survivor function for an individual). Rather one of my effects declines with time (time-dependent coefficient). For illustration, I use the example of John Fox's paper "Cox Proportional - Hazards Regression for Survival Data". http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf Do you know any help? See code below Thanks very much in advance J?rgen Biedermann #---------------------------------------- #Code Rossi<- read.table("http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt", header=T) Rossi.2<- fold(Rossi, time='week', event='arrest', cov=11:62, cov.names='employed') # see below for the fold function from John Fox # modeling an interaction with time (Page 14) mod.allison.5<- coxph(Surv(start, stop, arrest.time) ~ fin + age + age:stop + prio, data=Rossi.2) mod.allison.5 # Attempt to get the survivor function of a person with age=30, fin=0 and prio=5 newdata.1<- data.frame(unique(Rossi.2[c("start","stop")]),fin=0,age=30,prio=5,Id=1,arrest.time=0) fit<- survfit(mod.allison.5,newdata.1,id="Id") Error message:
>Fehler in model.frame.default(data = newdata.1, id = "Id", formula =
Surv(start, :
Variablenl?ngen sind unterschiedlich (gefunden f?r '(id)')
--> failure, length of variables are different.
#-----------------------------------------------------------------
fold<- function(data, time, event, cov,
cov.names=paste('covariate', '.', 1:ncovs, sep=""),
suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){
vlag<- function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)])
xlag<- function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag)
all.cov<- unlist(cov)
if (!is.list(cov)) cov<- list(cov)
ncovs<- length(cov)
nrow<- nrow(data)
ncol<- ncol(data)
ncov<- length(cov[[1]])
nobs<- nrow*ncov
if (length(unique(c(sapply(cov, length), length(cov.times)-1)))> 1)
stop(paste(
"all elements of cov must be of the same length and \n",
"cov.times must have one more entry than each element of
cov."))
var.names<- names(data)
subjects<- rownames(data)
omit.cols<- if (!common.times) c(all.cov, cov.times) else all.cov
keep.cols<- (1:ncol)[-omit.cols]
nkeep<- length(keep.cols)
if (is.numeric(event)) event<- var.names[event]
times<- if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T)
else data[, cov.times]
new.data<- matrix(Inf, nobs, 3 + ncovs + nkeep)
rownames<- rep("", nobs)
colnames(new.data)<- c('start', 'stop', paste(event, suffix, sep=""),
var.names[-omit.cols], cov.names)
end.row<- 0
for (i in 1:nrow){
start.row<- end.row + 1
end.row<- end.row + ncov
start<- times[i, 1:ncov]
stop<- times[i, 2:(ncov+1)]
event.time<- ifelse (stop == data[i, time]& data[i, event] ==
1, 1, 0)
keep<- matrix(unlist(data[i, -omit.cols]), ncov, nkeep, byrow=T)
select<- apply(matrix(!is.na(data[i, all.cov]), ncol=ncovs),
1, all)
rows<- start.row:end.row
cov.mat<- xlag(matrix(unlist(data[i, all.cov]),
nrow=length(rows)), lag)
new.data[rows[select], ]<-
cbind(start, stop, event.time, keep, cov.mat)[select,]
rownames[rows]<- paste(subjects[i], '.', seq(along=rows), sep="")
}
row.names(new.data)<- rownames
as.data.frame(new.data[new.data[, 1] != Inf&
apply(as.matrix(!is.na(new.data[, cov.names])), 1, all), ])
}
#-----------------------------------------------------------------
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120617/f05ef9c0/attachment.pl>
1 day later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120618/fe481a9b/attachment.pl>
8 days later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120626/faff3efe/attachment.pl>