Skip to content

differences between survival models between STATA and R

2 messages · Javier Palacios Fenech, jthetzel

#
Please, find an example with a data set here.

The data sets are prepared to be read with R (dataR) and STATA (dataSTATA)
directly. The only difference is that STATA treats blanks as NAs.

*in R, by using aftreg:*

dataR<-read.table("clipboard",h=T)
a+b+c+d+g+h+j+k+l+m+factor(pcont),dist="weibull",
data=data.frame(dataR),id=ID)
Call:
aftreg(formula = Surv(sta, sto, S) ~ a + b + c + d + g + h +
    j + k + l + m + factor(pcont), data = data.frame(thern),
    dist = "weibull", id = ID)

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
a                  39.416     0.001     1.001     0.002     0.598
b                  55.879     0.002     1.002     0.001     0.002
c                  49.554    -0.001     0.999     0.001     0.251
d                  51.266     0.000     1.000     0.000     0.758
g                  14.701    -0.006     0.994     0.002     0.011
h                  32.358     0.005     1.005     0.001     0.000
j                  51.768    -0.022     0.978     0.001     0.000
k                  53.832    -0.004     0.996     0.001     0.000
l                  18.851     0.016     1.017     0.001     0.000
m                   0.809     0.907     2.478     0.132     0.000
factor(pcont)
               1    0.007     0         1           (reference)
               2    0.272    -0.051     0.950     0.158     0.745
               3    0.201     0.653     1.922     0.155     0.000
               4    0.253     0.181     1.198     0.156     0.246
               5    0.267     0.691     1.996     0.162     0.000

log(scale)                    3.369    29.038     0.225     0.000
log(shape)                    1.925     6.858     0.055     0.000

Events                    190
Total time at risk          3129
Max. log. likelihood      -298.47
LR test statistic         681
Degrees of freedom        14
Overall p-value           0



*in STATA, by using streg:*

 insheet using "dataSTATA.txt"
(18 vars, 4862 obs)

.
. stset ntime, failure(s) id(id)
                id:  id
     failure event:  s != 0 & s < .
obs. time interval:  (ntime[_n-1], ntime]
 exit on or before:  failure
------------------------------------------------------------------------------
     4862  total obs.
        0  exclusions
------------------------------------------------------------------------------
     4862  obs. remaining, representing
      351  subjects
      283  failures in single failure-per-subject data
     4862  total analysis time at risk, at risk from t =         0
                             earliest observed entry t =         0
                                  last observed exit t =        30

.
. gen f1 = 0
. replace f1 = 1 if pcont==1
(520 real changes made)

.
. gen f2 = 0
. replace f2= 1 if pcont==2
(1267 real changes made)


. gen f3 = 0
. replace f3= 1 if pcont==3
(771 real changes made)


. gen f4 = 0
. replace f4= 1 if pcont==4
(960 real changes made)


. gen f5 = 0
. replace f5= 1 if pcont==5
(1344 real changes made)


. streg  f1 f2 f3 f4 f5 a b c d g h j k l m, dist(weibull)  time nolog

         failure _d:  s
   analysis time _t:  ntime
                 id:  id
note: f5 dropped due to collinearity

Weibull regression -- accelerated failure-time form

No. of subjects =          228                     Number of obs   =
 3129
No. of failures =          190
Time at risk    =         3129
                                                   LR chi2(14)     =
 226.94
Log likelihood  =   -47.541886                     Prob > chi2     =
 0.0000

------------------------------------------------------------------------------
          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------------
          f1 |   .7194758   .2271213     3.17   0.002     .2743262
 1.164625
          f2 |   .4011278   .0604573     6.63   0.000     .2826336
.519622
          f3 |   .0142088   .0676573     0.21   0.834    -.1183972
 .1468148
          f4 |   .2984225   .0814102     3.67   0.000     .1388614
 .4579835
           a |   .0030444   .0024399     1.25   0.212    -.0017377
 .0078265
           b |  -.0008451   .0009987    -0.85   0.397    -.0028026
 .0011124
           c |   .0015207   .0009827     1.55   0.122    -.0004055
 .0034468
           d |   .0005143   .0007139     0.72   0.471    -.0008848
 .0019135
           g |   .0024349   .0025024     0.97   0.331    -.0024698
 .0073395
           h |  -.0024076   .0012125    -1.99   0.047    -.0047842
-.0000311
           j |    .013318   .0012565    10.60   0.000     .0108553
 .0157806
           k |    .002104   .0010438     2.02   0.044     .0000581
 .0041498
           l |  -.0017612   .0010071    -1.75   0.080     -.003735
 .0002127
           m |  -.0694492    .090727    -0.77   0.444    -.2472708
 .1083724
       _cons |   1.812124   .2000892     9.06   0.000     1.419956
 2.204291
-------------+----------------------------------------------------------------
       /ln_p |   1.536971   .0756138    20.33   0.000      1.38877
 1.685171
-------------+----------------------------------------------------------------
           p |   4.650481   .3516405                      4.009916
 5.393373
         1/p |   .2150315   .0162593                      .1854127
 .2493818
------------------------------------------------------------------------------

I do an accelerated failure time model with both programs. The problem of
"survreg" is that it cannot handle time-dependent covariates:
https://stat.ethz.ch/pipermail/r-help/2010-July/247216.html



Thanks,
On Mon, Jul 9, 2012 at 10:19 AM, Terry Therneau <therneau at mayo.edu> wrote: