Skip to content

Defining reference category for a cph model summary inside of a "for" loop

6 messages · Frank E Harrell Jr, Wells, Brian

#
I have the following code.
~1,data=single.dat, x=T, y=T, surv=T)
"Yes")~',i,sep='')))
There is no error message generated in R, but R ignores the reference
category defined with paste in the summary function for the cph model. 

 

The output uses the "1st Quartile" as the reference category to
calculate hazards for some of the variables defined by i, but not all of
them. 

 

 

Any help would be greatly appreciated. 

 

thanks

 

Brian J. Wells, MD, MS

Research Associate

Quantitative Health Sciences

Cleveland Clinic

 

 


===================================

P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S. News & World Report (2007).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Dr. Brian J. Wells.vcf
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20080328/5bc1fd92/attachment.pl
#
Wells, Brian wrote:
Your code is confusing.  What is to the right of ~ in a formula is a 
predictor variable name, not a value.  If your variables are named A, B, 
C, ... you are OK.

'1st Quartile' has no special meaning to R or Design, and you can't pass 
a character string as a second argument to summary and expect it to work.

You will need parse(text=paste(...)) to create an appropriate expression.

But Design gives you inter-quartile range hazard ratios by default anyway.

Beware of getting hazard ratios that are not adjusted for other 
variables needed in the model.

Frank Harrell

  
    
#
Dr. Harrell, 
Thanks for you help. 

I tried:
Same result. No error, the reference category simply doesn't change. 

Brian 

-----Original Message-----
From: Frank E Harrell Jr [mailto:f.harrell at vanderbilt.edu] 
Sent: Friday, March 28, 2008 8:34 PM
To: Wells, Brian
Cc: r-help at r-project.org
Subject: Re: [R] Defining reference category for a cph model summary
inside of a "for" loop
Wells, Brian wrote:
of
Your code is confusing.  What is to the right of ~ in a formula is a 
predictor variable name, not a value.  If your variables are named A, B,

C, ... you are OK.

'1st Quartile' has no special meaning to R or Design, and you can't pass

a character string as a second argument to summary and expect it to
work.

You will need parse(text=paste(...)) to create an appropriate
expression.

But Design gives you inter-quartile range hazard ratios by default
anyway.

Beware of getting hazard ratios that are not adjusted for other 
variables needed in the model.

Frank Harrell

  
    
#
Wells, Brian wrote:
That's good, because the default in summary is to compare the outer 
quartiles for a continuous variable.  And as I said before the string 
'1st Quartile' has no special meaning for R or Design.

Get what you are trying to do to work without parse (and you'll need 
eval() with parse) first.  When you want total control over a setting, 
say getting a hazard ratio for the .2 to the .8 quantile, do something like

summary(f, age=quantile(age,c(.2,.8),na.rm=TRUE))

Frank

  
    
1 day later
#
Frank, 

Thanks again, I didn't realize that continuous variables could be
manipulated that way inside of the summary function. 

I realize that my code was kind of confusing. 

The variables "A"..."F" are all categorical variables. They each have
four levels named "1st Quartile"...."4th Quartile"

I tried the code below with the same result.
In the output, the reference category is different for each of the
variables. 

Brian 
-----Original Message-----
From: Frank E Harrell Jr [mailto:f.harrell at vanderbilt.edu] 
Sent: Sunday, March 30, 2008 9:14 AM
To: Wells, Brian
Cc: r-help at r-project.org
Subject: Re: [R] Defining reference category for a cph model summary
inside of a "for" loop
Wells, Brian wrote:
That's good, because the default in summary is to compare the outer 
quartiles for a continuous variable.  And as I said before the string 
'1st Quartile' has no special meaning for R or Design.

Get what you are trying to do to work without parse (and you'll need 
eval() with parse) first.  When you want total control over a setting, 
say getting a hazard ratio for the .2 to the .8 quantile, do something
like

summary(f, age=quantile(age,c(.2,.8),na.rm=TRUE))

Frank
model.
B,
pass

  
    
#
Wells, Brian wrote:
Thanks for clarifying.  That approach will NOT provide estimates at the 
quartiles.  For example a hazard ratio for the "upper quartile category" 
to the "lower quartile category" will estimate the ratio of hazards when 
X>Q3 to when X<Q1 where outer quartiles are Q1 and Q3.  This represents 
a hazard ratio of an unknown mixture of distributions and will not 
transport to another sample with a different mixture.

In addition you will have serious residual confounding with that 
approach by not adjusting for all the information in continuous predictors.

Frank