Skip to content

interpreting bootstrap corrected slope [rms package]

11 messages · Frank E Harrell Jr, Adam, David Winsemius

#
Dear List:

Below is the validation output of a fitted ordinal logistic model
using the bootstrap in the rms package.  My interpretation is that
most of the corrected indices indicate little overfitting, however the
slope seems to indicate that the model is too optimistic.  Given that
most of the corrected indices seem reasonable, would it be appropriate
to use this model on future data if the corrected intercept and slope
estimates are used?

          index.orig training    test optimism index.corrected   n
Dxy          0.9932   0.9940  0.9905   0.0035          0.9897 363
R2            0.9291   0.9364  0.9163   0.0202          0.9089 363
Intercept   0.0000   0.0000  0.0233  -0.0233          0.0233 363
Slope       1.0000   1.0000  0.7836   0.2164          0.7836 363
Emax       0.0000   0.0000  0.0582   0.0582          0.0582 363
D              0.9118   0.9190  0.8915   0.0275          0.8844 363
U             -0.0110  -0.0110  0.0124  -0.0234          0.0124 363
Q              0.9228   0.9299  0.8791   0.0508          0.8720 363
B              0.0205   0.0172  0.0239  -0.0067          0.0272 363


Any input is much appreciated.

Thanks,
Adam
#
Adam - the very low amount of optimism suggests that you have a large sample
size and that your model was completely pre-specified.  If you did any
feature/variable selection or made any model changes in a way that was not
blinded to Y then you are not using the software correctly.  But you are
right the slope decrement indicates a bit of overfitting on an absolute
calibration scale.  The harm done by this can be partially interpreted by
the Emax value of 0.05 indicated the maximum absolute calibration error is
estimated to be 0.05 on the probability scale.  If your exceedence
probabilities for the middle Y category have a wide range then 0.05 isn't so
bad.

Frank
apeer wrote:
-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3928467.html
Sent from the R help mailing list archive at Nabble.com.
#
Dr. Harrell,

Thanks for your response.  The predictor variables I initially included in
the model were based on the x mean plots and whether they exhibited
ordinality and whether they appeared to meet the CR assumptions.  Only 7 of
16 potential variables fit that designation and those are the variables I
initially included.  I then used backward variable selection, which selected
3 significant terms.  Does that seem reasonable?  

Also, are you saying that if the exceedence probabilites for the middle Y
category have a wide range then keeping the model as is would be fine for
future predictions?

Thanks for your time,
Adam

--
View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3930088.html
Sent from the R help mailing list archive at Nabble.com.
#
That's not reasonable for 2 reasons.  First, selecting variables based on
apparent assumption satisfaction is an unexplored technique.  Second, you
failed to account for variable selection during resampling validation.  You
will need to give the model all CANDIDATE variables and use the bw=TRUE
option for validate() and calibrate() to get the right answer.  You'll have
to specify the stopping rule too.

If there is a wide range of predicted probabilities then an Emax of 0.05 is
less stressful.  But the Emax is meaningless if you didn't repeat all
modeling steps that used Y for each resampling iteration.

Frank
apeer wrote:
-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3930552.html
Sent from the R help mailing list archive at Nabble.com.
#
I guess I must be misunderstanding the point of checking the ordinality
assumptions prior to fitting a model.  Are you saying that a response
variable that does not behave in an ordinal fashion can still be included in
the initial and final model?

--
View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3930644.html
Sent from the R help mailing list archive at Nabble.com.
#
You also did unaccounted for stepwise selection.  Regarding the proportional
odds assumption, if you assessed it correctly, something that is not
operating proportionally would have to be associated with the outcome for at
least one cutoff of Y, so you could say that you are doing reverse screening
that will need to be accounted for in resampling.
Frank
apeer wrote:
-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3931493.html
Sent from the R help mailing list archive at Nabble.com.
#
Does your point about proportionality also hold for ordinality?  In other
words, if I have several X variables that do not behave in an ordinal
fashion with Y, should I still include them in the full model?  My
understanding or perhaps misunderstanding of the ordinality assumption was
that all X variables included in the model should behave in an ordinal
fashion with Y.  Is that not the case?

--
View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3931594.html
Sent from the R help mailing list archive at Nabble.com.
#
On Oct 23, 2011, at 7:37 PM, apeer wrote:

            
Why should non-monotonic relationships be discarded? Are you implying  
they are impossible from a scientific perspective?
#
I'm not implying they should be discarded; however, at the same time I'm not
certain I fully understand why we should check the ordinality assumption if
in the end we're going to include predictors with which the response
variable behaves in a non-ordinal fashion.

--
View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3932863.html
Sent from the R help mailing list archive at Nabble.com.
#
A few issues -

Don't let the overall unimportance of a predictor make you worry about
non-ordinality (e.g., when scale of plot.xmean.ordinaly has a low range on
the y-axis).

We frequently have to face the issue of using an imperfect model by fitting
a few variables that don't exactly meet our assumptions, if the majority of
variables do.  One reason for this is that competing methods may fare worse.

Another option is to fit a more flexible model such as the partial
proportional odds model.  I haven't implemented this in my packages. 
Another R package may do the job (but without model validation mechanisms
provided by rms).

Frank
apeer wrote:
-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3932923.html
Sent from the R help mailing list archive at Nabble.com.
#
One last thing.  At the outset of this discussion I provided the results of a
validation procedure on a model (see below).  As discussed previously, the
model overall seems to fair well, with the exception of the slope.  With
that in mind, is there a way to correct the coefficients of the model to
account for the corrected slope so that future predictions on a new data set
are more accurate?  Or is that not recommended at all?

          index.orig training    test optimism index.corrected   n
Dxy          0.9932   0.9940  0.9905   0.0035          0.9897 363
R2            0.9291   0.9364  0.9163   0.0202          0.9089 363
Intercept   0.0000   0.0000  0.0233  -0.0233          0.0233 363
Slope       1.0000   1.0000  0.7836   0.2164          0.7836 363
Emax       0.0000   0.0000  0.0582   0.0582          0.0582 363
D              0.9118   0.9190  0.8915   0.0275          0.8844 363
U             -0.0110  -0.0110  0.0124  -0.0234          0.0124 363
Q              0.9228   0.9299  0.8791   0.0508          0.8720 363
B              0.0205   0.0172  0.0239  -0.0067          0.0272 363 

--
View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3933121.html
Sent from the R help mailing list archive at Nabble.com.