Defining reference category for a cph model summary inside of a "for" loop
Wells, Brian wrote:
Frank, Thanks again, I didn't realize that continuous variables could be manipulated that way inside of the summary function. I realize that my code was kind of confusing. The variables "A"..."F" are all categorical variables. They each have four levels named "1st Quartile"...."4th Quartile" I tried the code below with the same result.
print(summary(f, eval(parse(text=paste(i,"='1st Quartile'", sep='')))))
In the output, the reference category is different for each of the variables. Brian
Thanks for clarifying. That approach will NOT provide estimates at the quartiles. For example a hazard ratio for the "upper quartile category" to the "lower quartile category" will estimate the ratio of hazards when X>Q3 to when X<Q1 where outer quartiles are Q1 and Q3. This represents a hazard ratio of an unknown mixture of distributions and will not transport to another sample with a different mixture. In addition you will have serious residual confounding with that approach by not adjusting for all the information in continuous predictors. Frank
-----Original Message----- From: Frank E Harrell Jr [mailto:f.harrell at vanderbilt.edu] Sent: Sunday, March 30, 2008 9:14 AM To: Wells, Brian Cc: r-help at r-project.org Subject: Re: [R] Defining reference category for a cph model summary inside of a "for" loop Wells, Brian wrote:
Dr. Harrell, Thanks for you help. I tried:
print(summary(f,parse(text=paste(i,'="1st Quartile"', sep=''))))
Same result. No error, the reference category simply doesn't change.
That's good, because the default in summary is to compare the outer quartiles for a continuous variable. And as I said before the string '1st Quartile' has no special meaning for R or Design. Get what you are trying to do to work without parse (and you'll need eval() with parse) first. When you want total control over a setting, say getting a hazard ratio for the .2 to the .8 quantile, do something like summary(f, age=quantile(age,c(.2,.8),na.rm=TRUE)) Frank
Brian -----Original Message----- From: Frank E Harrell Jr [mailto:f.harrell at vanderbilt.edu] Sent: Friday, March 28, 2008 8:34 PM To: Wells, Brian Cc: r-help at r-project.org Subject: Re: [R] Defining reference category for a cph model summary inside of a "for" loop Wells, Brian wrote:
I have the following code.
f <- cph(formula = Surv(TimeToDeath, Dead == "Yes")
~1,data=single.dat, x=T, y=T, surv=T)
for(i in c('A', 'B', 'C', 'D', 'E', 'F')){
f <-update(f,as.formula(paste('Surv(TimeToDeath, Dead ==
"Yes")~',i,sep='')))
print(summary(f, paste(i,"=1st Quartile", sep='')))
There is no error message generated in R, but R ignores the reference category defined with paste in the summary function for the cph
model.
The output uses the "1st Quartile" as the reference category to calculate hazards for some of the variables defined by i, but not all
of
them.
Your code is confusing. What is to the right of ~ in a formula is a predictor variable name, not a value. If your variables are named A,
B,
C, ... you are OK. '1st Quartile' has no special meaning to R or Design, and you can't
pass
a character string as a second argument to summary and expect it to work. You will need parse(text=paste(...)) to create an appropriate expression. But Design gives you inter-quartile range hazard ratios by default anyway. Beware of getting hazard ratios that are not adjusted for other variables needed in the model. Frank Harrell
Any help would be greatly appreciated. thanks Brian J. Wells, MD, MS Research Associate Quantitative Health Sciences Cleveland Clinic
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University