Skip to content

coxph diagnostics

4 messages · Terry Therneau, Soumitro Dey, Andrews, Chris +1 more

#
That's the primary reason for the plot: so that you can look and think.

The test statistic is based on whether a LS line fit to the plot has zero slope.  For 
larger data sets you can sometimes have a "significant" p-value but good agreement with 
proportional hazards.  It's much like an example from Lincoln Moses' begining statistics 
book (now out of print, so rephrasing from memory).
    "Suppose that you flip a coin 10,000 times and get 5101 heads.  What can you say?
        a. The coin is not perfectly fair (p<.05).  b. But it is darn close to perfect! "
As a referee I would be comfortable using that coin to start a football game.

The Cox model gives an average hazard ratio, averaged over time.  When proportional 
hazards holds that value is a complete summary-- nothing else is needed.    When it does 
not hold, the average may still be useful, or not, depending on the degree of change over 
time.

Terry Therneau
On 08/13/2013 05:00 AM, r-help-request at r-project.org wrote:
#
"Based on the plot of Schoenfeld residuals and Terry's explanation is it safe to say that proportional hazards assumption holds despite the significant global p-values?"

No.  I don't want to put words in Terry's mouth, but he seems to be saying that proportional hazards does NOT hold but it may be close enough to be useful.  This is always a problem with goodness-of-fit tests and large datasets.

Chris


-----Original Message-----
From: Soumitro Dey [mailto:soumitrodey1 at gmail.com] 
Sent: Tuesday, August 13, 2013 10:38 AM
To: Terry Therneau
Cc: r-help at r-project.org
Subject: Re: [R] coxph diagnostics

Thank you for your response, Terry.

To put the discussion into perspective, my data set is quite large with
over 160,000 samples and 38 variables. The event is true for all samples in
this dataset. The distribution is zero-inflated (i.e. most events occur at
time = 0).

The result of the cox.zph looks like this:
agency1          -1.05e-02 9.06e+00 2.62e-03
agency2           -5.48e-03 2.47e+00 1.16e-01
agency3           -6.47e-03 3.45e+00 6.34e-02
agency4           -6.86e-03 3.87e+00 4.90e-02
agency5           -5.56e-03 2.54e+00 1.11e-01
agency6           -6.79e-03 3.79e+00 5.16e-02
agency7           -4.78e-03 1.88e+00 1.71e-01
agency8           -1.34e-02 1.48e+01 1.22e-04
agency9           -2.78e-03 6.34e-01 4.26e-01
agency10          -6.15e-03 3.11e+00 7.78e-02
agency11           4.82e-04 1.91e-02 8.90e-01
agency12          -4.38e-03 1.58e+00 2.09e-01
agency13          -1.02e-03 8.54e-02 7.70e-01
agency14          -5.44e-03 2.43e+00 1.19e-01
agency15           1.01e-02 8.41e+00 3.73e-03
agency16          -1.81e-03 2.70e-01 6.04e-01
agency17          -3.14e-03 8.12e-01 3.67e-01
agency18          -6.59e-03 3.57e+00 5.88e-02
agency19           1.60e-03 2.12e-01 6.46e-01
agency20          -1.24e-02 1.27e+01 3.74e-04
agency21          -9.02e-03 6.69e+00 9.68e-03
agency22          -5.84e-03 2.81e+00 9.38e-02
agency23           3.99e-03 1.31e+00 2.52e-01
agency24          -9.18e-03 6.93e+00 8.50e-03
agency25          -4.75e-03 1.86e+00 1.73e-01
category1         -1.31e-02 1.43e+01 1.60e-04
category2          1.34e-04 1.47e-03 9.69e-01
category3          7.61e-03 4.75e+00 2.92e-02
category4         -6.65e-03 3.69e+00 5.48e-02
category5         -7.78e-03 4.97e+00 2.58e-02
category6         -8.64e-03 6.12e+00 1.34e-02
fav_count          1.32e-02 1.46e+01 1.32e-04
fow_count         -1.83e-02 2.50e+01 5.70e-07
fri_count          9.20e-03 6.89e+00 8.67e-03
stat_count         1.01e-02 9.08e+00 2.58e-03
ht                 1.37e-02 1.53e+01 9.08e-05
ul                  1.36e-02 1.52e+01 9.67e-05
um                  -1.12e-02 1.04e+01 1.24e-03
pos                 -5.92e-04 2.90e-02 8.65e-01
neg                  6.44e-03 3.39e+00 6.56e-02
acti                 2.24e-03 4.12e-01 5.21e-01
anat                 3.48e-03 9.96e-01 3.18e-01
chemi               -7.82e-03 5.04e+00 2.47e-02
conc                 7.04e-05 4.08e-04 9.84e-01
devi                -1.34e-03 1.48e-01 7.01e-01
diso                -3.60e-03 1.06e+00 3.04e-01
gene                 1.31e-03 1.41e-01 7.07e-01
geog                 4.64e-03 1.78e+00 1.82e-01
livb                -1.19e-02 1.17e+01 6.24e-04
objc                 3.87e-03 1.23e+00 2.67e-01
occu                 6.06e-04 3.04e-02 8.62e-01
orga                -8.24e-04 5.63e-02 8.12e-01
phen                 3.87e-03 1.23e+00 2.68e-01
phys                -1.94e-03 3.12e-01 5.77e-01
proc                 2.23e-03 4.11e-01 5.22e-01
GLOBAL                     NA 4.20e+02 0.00e+00


The slope of the plot.cox.zph is perfectly 0 for all variables with narrow
confidence bands.

I probably should have put this details in the first post but it would have
been too long. Sorry about that.

Based on the plot of Schoenfeld residuals and Terry's explanation is it
safe to say that proportional hazards assumption holds despite the
significant global p-values?

Thanks!
On Tue, Aug 13, 2013 at 9:16 AM, Terry Therneau <therneau at mayo.edu> wrote:

            
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
1 day later
#
Dr. Therneau,
Thank you as always for first writing, and second continuing the Cox model in R (and earlier I believe in SAS).
 
While your comments concerning non-proportional hazards is helpful, it does not fully address the question, "What alternatives do I have if I assume proportional assumption of coxph does not hold?" The traditional answer would be, I believe, to define strata of a non proportional independent variable so that within strata the hazards are proportional, and then run the analyses accounting for the strata. While this will deal with a variable entered as a "nuisance" parameter, i.e. one that one wants to adjust for, but one that one is not interested in drawing inferences about, it does not solve the problem if the non-proportional covariate is one about which one wishes to make inferences as one does not get an estimate for a parameter used to define strata. Could you give some guidance about ways to deal with a non-proportional independent variable about which one does wish to make inferences?
Thank you,
John   


John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
That's the primary reason for the plot: so that you can look and think.

The test statistic is based on whether a LS line fit to the plot has zero slope. For 
larger data sets you can sometimes have a "significant" p-value but good agreement with 
proportional hazards. It's much like an example from Lincoln Moses' begining statistics 
book (now out of print, so rephrasing from memory).
 "Suppose that you flip a coin 10,000 times and get 5101 heads. What can you say?
 a. The coin is not perfectly fair (p<.05). b. But it is darn close to perfect! "
As a referee I would be comfortable using that coin to start a football game.

The Cox model gives an average hazard ratio, averaged over time. When proportional 
hazards holds that value is a complete summary-- nothing else is needed. When it does 
not hold, the average may still be useful, or not, depending on the degree of change over 
time.

Terry Therneau
On 08/13/2013 05:00 AM, r-help-request at r-project.org wrote:
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Confidentiality Statement:
This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information.  Any unauthorized use, disclosure or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.