standard error of survfit.coxph() - R-help

Mon, Jun 30, 2014 6:04 AM #

1. The computations "behind the scenes" produce the variance of the cumulative hazard. 
This is true for both an ordinary Kaplan-Meier and a Cox model.  Transformations to other 
scales are done using simple Taylor series.

   H = cumulative hazard = log(S);  S=survival
   var(H) = var(log(S))  = the starting point
   S = exp(log(S)), so  var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = S^2 var(H)
   var(log(log(S)) is approx (1/S^2) var(H)

2. At the time it was written, summary.survfit was used only for printing out the survival 
curve at selected times, and the audience for the printout wanted std(S).   True, that was 
20 years ago, but I don't recall anyone ever asking for summary to do anything else.  Your 
request is not a bad idea.
   Note however that the primary impact of using log(S) or S or log(log(S)) scale is is on 
the confidence intervals, and they do appear per request in the summary output.

Terry T.

On 06/28/2014 05:00 AM, r-help-request at r-project.org wrote:

Message: 9
Date: Fri, 27 Jun 2014 12:39:29 -0700
From: array chip<arrayprofile at yahoo.com>
To:"r-help at r-project.org"  <r-help at r-project.org>
Subject: [R] standard error of survfit.coxph()
Message-ID:
	<1403897969.91269.YahooMailNeo at web122906.mail.ne1.yahoo.com>
Content-Type: text/plain

Hi, can anyone help me to understand the standard errors printed in the output of survfit.coxph()?

time<-sample(1:15,100,replace=T)

status<-as.numeric(runif(100,0,1)<0.2)
x<-rnorm(100,10,2)

fit<-coxph(Surv(time,status)~x)
??? ### method 1

survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$std.err

???????????? [,1]??????? [,2]??????? [,3]??????? [,4]?????? [,5]
?[1,] 0.000000000 0.000000000 0.000000000 0.000000000 0.00000000
?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394
[10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
[11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
[12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413
[13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433
[14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860
[15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111

survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$time
?[1]? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15

??? ### method 2

summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log'),time=10)$std.err

????????????? 1????????? 2????????? 3????????? 4????????? 5
[1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532

By reading the help of ?survfit.object and ?summary.survfit, the standard error provided in the output of method 1 (survfit()) was for cumulative hazard-log(survival), while the standard error provided in the output of method 2 (summary.survfit()) was for survival itself, regardless of how you choose the value for "conf.type" ('log', 'log-log' or 'plain'). This explains why the standard error output is different between method 1 (10th row) and method 2.

My question is how do I get standard error estimates for log(-log(survival))?

Thanks!

John