bootstrapped cox regression (rms package)

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121129/da8a6b92/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121129/5e079986/attachment.pl>
Quite a few people have had this problem, but since I'm unable to
reproduce it, I'm not exactly sure how to fix it either. A few
references that might be helpful to you:

http://stackoverflow.com/q/12448507/559676
https://github.com/yihui/knitr/issues/413

It is very likely to be a pure LaTeX problem. Letting MikTeX install
the missing LaTeX packages on the fly might solve the problem.

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA
R Users,

I just upgraded my version of R from R-2.15.0 to R-2.15.2 and installed the latest versions of LyX and MikTex running Windows 7 Ultimate, 64-bit OS.  Prior to the upgrade, I was using Lyx with knitr to generate a document with no problems.  However, after the upgrade, and using the same LyX document, I'm receiving the following error when I attempt to compile the document:

\end{verbatim}
The control sequence at the end of the top line
of your error message was never \def'ed. If you have
misspelled it (e.g., `\hobx'), type `I' and the correct
spelling (e.g., `I\hbox'). Otherwise just continue,
and I'll forget about whatever was undefined.

I have determined that the error is caused when printing the anova results from the anova statement in my R source code, but can't seem to resolve the issue.  Here is an example code chunk that creates the error:

<<NonCP1, fig.width=6, fig.height=4, out.width='.8\\linewidth' ,par=FALSE>>=
#Read in data
y=c( 67, 73, 83, 89, 65, 91, 87, 86, 155, 127, 147, 212, 108, 100, 90, 153, 140, 142, 121, 150, 33, 8, 46, 54 )
temp=as.factor(c(rep(seq(360, 380, 10), each=4), rep(seq(380, 360, -10), each=4)))
coat=as.factor(rep(seq(1, 4), 6))
replicate=as.factor(rep(seq(1, 6), each=4))
#Obtain Factorial/Incorrect Model
o=lm(y~temp*coat)
ano=anova(o)
ano
@

Removing the ano=anova(o) or ano lines in the code chunk allows the document to compile with no problem.  Does anyone else have this problem or did I do something wrong when I migrated to the newer versions?

Thanks, in advance for any help!

Sincerely yours,

Mark J. Lamias
        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121129/5e071ea1/attachment.pl>
That is very helpful! Just to continue debugging, can you save the two
versions of the tex files produced from LyX with different versions of
R and do a diff on them? It sounds like something has changed from R
2.15.0 to 2.15.2.

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA
Thanks, Yihui!

Luckily I kept R-2.15.0 and left it untouched (so I can continue to use that
for now).  If it helps any, I was able to go back into Lyx and change the
path to point to R-2.15.0 and I also changed the windows path environment
variable to point to the old version.  After doing this, LyX worked fine
with no problem on the code below.  Changing the paths back to the new
version  R-2.15.2, generates the error below.

If anyone else has any idea how to resolve this, either through R or a
Lyx/LaTeX fix, I'd be all ears.

Thanks, again for your response, Yihui!

Sincerely yours,

Mark J. Lamias
Eric, the output you showed for anova(out) is not correct.  anova.rms does
not produce such output.  Please give us the correct script that obtained
those results and let us know if you are overriding the anova command
somehow.

To your point, make sure that SPSS does not use the bootstrap to obtain a
new point estimate of beta but rather uses the original Cox  beta
coefficients in the test.

Frank

Eric Claus wrote
Hi,
I am trying to convert a colleague from using SPSS to R, but am having
trouble generating a result that is similar enough to a bootstrapped cox
regression analysis that was run in SPSS.  I tried unsuccessfully with
bootcens, but have had some success with the bootcov function in the rms
package, which at least generates confidence intervals similar to what is
observed in SPSS.  However, the p-values associated with each predictor in
the model are not really close in many instances.

Here is the code I am using:

formula=Surv(months, recidivate) ~ fac1 + fac2 + fac3 + fac4 + fac5 + fac6
+ fac7 + fac8
fit=cph(formula, data=temp, x=T, y=T)
validate(fit, method="boot", B=9999, bw=F, type="residual", sls=0.05,
aics=0,force=NULL, estimates=TRUE, pr=FALSE)
out=bootcov(fit, B=9999, pr=F, coef.reps=T, loglik=F)
for (i in 1:8) {
print(quantile(out$boot.Coef[,i], c(.025, .975)))
}
anova(out)

variable low CI high CI p-value
fac1 -8.919692 20.800878 .5917
fac2 -8.683579  3.091100 .6381
fac3 -1.848428  2.193492 .9312
fac4 -0.17575426  0.08333277 .8246
fac5 -3.1488578  0.5166171 .2946
fac6 -0.03621405  0.07241772 .5600
fac7 -0.62847922  0.08566296 .3433
fac8 -0.01553286  0.20909384 .5756

The results from SPSS I am trying to match (or come close to matching) are
the following:
variable low CI high CI p-value
fac1 -8.474 20.020 .456
fac2 -8.206 3.093 .524
fac3 -1.829 2.087 .900
fac4 -.173 .083 .749
fac5 -2.945 .450 .143
fac6 -.035 .070 .306
fac7 -.626 .092 .189
fac8 -.017 .203 .247

Sorry if this is a really basic question.  I have searched for several
hours for an explanation, but cannot find anything that explains why the
p-values would be different despite similar confidence intervals.

Thanks in advance,
Eric

	[[alternative HTML version deleted]]

______________________________________________

R-help@
 mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/bootstrapped-cox-regression-rms-package-tp4651306p4651344.html
Sent from the R help mailing list archive at Nabble.com.
Hi, Yihui,

Attached is an HTML Diff report of the two files.? The left pane contains the R-2.15.0 file. 

Thanks.

--Mark
That is very helpful! Just to continue debugging, can you save the two
versions of the tex files produced from LyX with different versions of
R and do a diff on them? It sounds like something has changed from R
2.15.0 to 2.15.2.

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA

On Thu, Nov 29, 2012 at 1:26 PM, Mark Lamias <mlamias at yahoo.com> wrote:
> Thanks, Yihui!
>
> Luckily I kept R-2.15.0 and left it untouched (so I can continue to use that
> for now).? If it helps any, I was able to go back into Lyx and change the
> path to point to R-2.15.0 and I also changed the windows path environment
> variable to point to the old version.? After doing this, LyX worked fine
> with no problem on the code below.? Changing the paths back to the new
> version? R-2.15.2, generates the error below.
>
> If anyone else has any idea how to resolve this, either through R or a
> Lyx/LaTeX fix, I'd be all ears.
>
> Thanks, again for your response, Yihui!
>
>
> Sincerely yours,
>
> Mark J. Lamias
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: DifferencesReport.htm
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121129/b9f0c312/attachment.pl>
Hi Frank,
Below is the actual output from the anova(out) command.  I had copied in the
p-values and from the previous output from anova(out) and the confidence
intervals from print(quantile(out$boot.Coef[,i], c(.025, .975))) to
illustrate that the confidence intervals were similar to SPSS while the
p-values were not.

Actual output from anova.rms(out):

 Wald Statistics          Response: Surv(months, recidivate) 

 Factor   Chi-Square d.f. P     
fac1  0.27       1    0.6055
fac2  0.20       1    0.6514
fac3  0.01       1    0.9338
fac4  0.05       1    0.8311
fac5  1.06       1    0.3036
fac6  0.33       1    0.5647
fac7  0.81       1    0.3670
fac8  0.30       1    0.5832
 TOTAL   1.48       8    0.9930

Regarding your second question, it looks like SPSS is using the original
estimate of Cox beta coefficients in the test (i.e. a new point estimate is
not generated for the statistical test)

Thanks again,
Eric

--
View this message in context: http://r.789695.n4.nabble.com/bootstrapped-cox-regression-rms-package-tp4651306p4651363.html
Sent from the R help mailing list archive at Nabble.com.
Thanks Eric.  It would be good to show your entire script next time as stated
in the posting guidance.

Regarding matching with SPSS please describe the bootstrapping algorithm
used there.  In rms I do the unconditional bootstrap, i.e., I sample with
replacement from the rows of the raw data.  And also make sure that SPSS ran
a large number of bootstrap replications.

Frank

Eric Claus wrote
Hi Frank,
Below is the actual output from the anova(out) command.  I had copied in
the p-values and from the previous output from anova(out) and the
confidence intervals from print(quantile(out$boot.Coef[,i], c(.025,
.975))) to illustrate that the confidence intervals were similar to SPSS
while the p-values were not.

Actual output from anova.rms(out):

 Wald Statistics          Response: Surv(months, recidivate) 

 Factor   Chi-Square d.f. P     
fac1  0.27       1    0.6055
fac2  0.20       1    0.6514
fac3  0.01       1    0.9338
fac4  0.05       1    0.8311
fac5  1.06       1    0.3036
fac6  0.33       1    0.5647
fac7  0.81       1    0.3670
fac8  0.30       1    0.5832
 TOTAL   1.48       8    0.9930

Regarding your second question, it looks like SPSS is using the original
estimate of Cox beta coefficients in the test (i.e. a new point estimate
is not generated for the statistical test)

Thanks again,
Eric
-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/bootstrapped-cox-regression-rms-package-tp4651306p4651438.html
Sent from the R help mailing list archive at Nabble.com.
Hi Frank,
My apologies for not posting the entire script - I have repasted it below.

library(rms)
library(foreign)
temp=read.spss('coxdata.sav', to.data.frame=T)

formula=Surv(months, recidivate) ~ fac1 + fac2 + fac3 + fac4 + fac5 + fac6 +
fac7 + fac8 
fit=cph(formula, data=temp, x=T, y=T) 
val.out=validate(fit, method="boot", B=9999, bw=F, type="residual",
sls=0.05, aics=0,force=NULL, estimates=TRUE, pr=FALSE) 
out=bootcov(fit, B=9999, pr=F, coef.reps=T, loglik=F) 
anova(out) 

 Factor   Chi-Square d.f. P     
fac1  0.27       1    0.6055 
fac2  0.20       1    0.6514 
fac3  0.01       1    0.9338 
fac4  0.05       1    0.8311 
fac5  1.06       1    0.3036 
fac6  0.33       1    0.5647 
fac7  0.81       1    0.3670 
fac8  0.30       1    0.5832 
 TOTAL   1.48       8    0.9930 

for (i in 1:8) {
print(quantile(out$boot.Coef[,i], c(.025, .975)))
}

    2.5%     97.5% 
-9.236751 20.772061 
     2.5%     97.5% 
-8.841030  3.094755 
     2.5%     97.5% 
-1.834436  2.161983 
      2.5%      97.5% 
-0.1800666  0.0871867 
      2.5%      97.5% 
-3.2129636  0.4783566 
       2.5%       97.5% 
-0.04157389  0.07130994 
      2.5%      97.5% 
-0.6415962  0.1001843 
       2.5%       97.5% 
-0.01529467  0.21055259 

Again, the SPSS output I am trying to match is here:
variable low CI high CI p-value 
fac1 -8.474 20.020 .456 
fac2 -8.206 3.093 .524 
fac3 -1.829 2.087 .900 
fac4 -.173 .083 .749 
fac5 -2.945 .450 .143 
fac6 -.035 .070 .306 
fac7 -.626 .092 .189 
fac8 -.017 .203 .247 

In looking through the SPSS syntax, my colleague is using SIMPLE resampling,
which is doing sampling with replacement from the original data set.  9999
bootstrap replications are being used, the same as what I have used in the
bootcov command.  The piece of the SPSS output that is not clear is the
generation of p-values from the distribution of parameter estimates; spss
appears to be testing the parameter estimate from the original cox
regression, but the method of testing that parameter is not clear.  

Eric

--
View this message in context: http://r.789695.n4.nabble.com/bootstrapped-cox-regression-rms-package-tp4651306p4651474.html
Sent from the R help mailing list archive at Nabble.com.
It will be crucial to know the details of the test statistic and P-value
calculations from SPSS.  It's also running anova on both the bootcov and the
original fits to see if SPSS is ignoring the bootstrap when computing the
covariance matrix.
Frank

Eric Claus wrote
Hi Frank,
My apologies for not posting the entire script - I have repasted it below.

library(rms)
library(foreign)
temp=read.spss('coxdata.sav', to.data.frame=T)

formula=Surv(months, recidivate) ~ fac1 + fac2 + fac3 + fac4 + fac5 + fac6
+ fac7 + fac8 
fit=cph(formula, data=temp, x=T, y=T) 
val.out=validate(fit, method="boot", B=9999, bw=F, type="residual",
sls=0.05, aics=0,force=NULL, estimates=TRUE, pr=FALSE) 
out=bootcov(fit, B=9999, pr=F, coef.reps=T, loglik=F) 
anova(out) 

 Factor   Chi-Square d.f. P     
fac1  0.27       1    0.6055 
fac2  0.20       1    0.6514 
fac3  0.01       1    0.9338 
fac4  0.05       1    0.8311 
fac5  1.06       1    0.3036 
fac6  0.33       1    0.5647 
fac7  0.81       1    0.3670 
fac8  0.30       1    0.5832 
 TOTAL   1.48       8    0.9930 

for (i in 1:8) {
print(quantile(out$boot.Coef[,i], c(.025, .975)))
}

    2.5%     97.5% 
-9.236751 20.772061 
     2.5%     97.5% 
-8.841030  3.094755 
     2.5%     97.5% 
-1.834436  2.161983 
      2.5%      97.5% 
-0.1800666  0.0871867 
      2.5%      97.5% 
-3.2129636  0.4783566 
       2.5%       97.5% 
-0.04157389  0.07130994 
      2.5%      97.5% 
-0.6415962  0.1001843 
       2.5%       97.5% 
-0.01529467  0.21055259 

Again, the SPSS output I am trying to match is here:
variable low CI high CI p-value 
fac1 -8.474 20.020 .456 
fac2 -8.206 3.093 .524 
fac3 -1.829 2.087 .900 
fac4 -.173 .083 .749 
fac5 -2.945 .450 .143 
fac6 -.035 .070 .306 
fac7 -.626 .092 .189 
fac8 -.017 .203 .247 

In looking through the SPSS syntax, my colleague is using SIMPLE
resampling, which is doing sampling with replacement from the original
data set.  9999 bootstrap replications are being used, the same as what I
have used in the bootcov command.  The piece of the SPSS output that is
not clear is the generation of p-values from the distribution of parameter
estimates; spss appears to be testing the parameter estimate from the
original cox regression, but the method of testing that parameter is not
clear.  

Eric
-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/bootstrapped-cox-regression-rms-package-tp4651306p4651493.html
Sent from the R help mailing list archive at Nabble.com.