Dear List:
Below is the validation output of a fitted ordinal logistic model
using the bootstrap in the rms package. My interpretation is that
most of the corrected indices indicate little overfitting, however the
slope seems to indicate that the model is too optimistic. Given that
most of the corrected indices seem reasonable, would it be appropriate
to use this model on future data if the corrected intercept and slope
estimates are used?
index.orig training test optimism index.corrected n
Dxy 0.9932 0.9940 0.9905 0.0035 0.9897 363
R2 0.9291 0.9364 0.9163 0.0202 0.9089 363
Intercept 0.0000 0.0000 0.0233 -0.0233 0.0233 363
Slope 1.0000 1.0000 0.7836 0.2164 0.7836 363
Emax 0.0000 0.0000 0.0582 0.0582 0.0582 363
D 0.9118 0.9190 0.8915 0.0275 0.8844 363
U -0.0110 -0.0110 0.0124 -0.0234 0.0124 363
Q 0.9228 0.9299 0.8791 0.0508 0.8720 363
B 0.0205 0.0172 0.0239 -0.0067 0.0272 363
Any input is much appreciated.
Thanks,
Adam
interpreting bootstrap corrected slope [rms package]
11 messages · Frank E Harrell Jr, Adam, David Winsemius
Adam - the very low amount of optimism suggests that you have a large sample size and that your model was completely pre-specified. If you did any feature/variable selection or made any model changes in a way that was not blinded to Y then you are not using the software correctly. But you are right the slope decrement indicates a bit of overfitting on an absolute calibration scale. The harm done by this can be partially interpreted by the Emax value of 0.05 indicated the maximum absolute calibration error is estimated to be 0.05 on the probability scale. If your exceedence probabilities for the middle Y category have a wide range then 0.05 isn't so bad. Frank
apeer wrote:
Dear List:
Below is the validation output of a fitted ordinal logistic model
using the bootstrap in the rms package. My interpretation is that
most of the corrected indices indicate little overfitting, however the
slope seems to indicate that the model is too optimistic. Given that
most of the corrected indices seem reasonable, would it be appropriate
to use this model on future data if the corrected intercept and slope
estimates are used?
index.orig training test optimism index.corrected n
Dxy 0.9932 0.9940 0.9905 0.0035 0.9897 363
R2 0.9291 0.9364 0.9163 0.0202 0.9089 363
Intercept 0.0000 0.0000 0.0233 -0.0233 0.0233 363
Slope 1.0000 1.0000 0.7836 0.2164 0.7836 363
Emax 0.0000 0.0000 0.0582 0.0582 0.0582 363
D 0.9118 0.9190 0.8915 0.0275 0.8844 363
U -0.0110 -0.0110 0.0124 -0.0234 0.0124 363
Q 0.9228 0.9299 0.8791 0.0508 0.8720 363
B 0.0205 0.0172 0.0239 -0.0067 0.0272 363
Any input is much appreciated.
Thanks,
Adam
______________________________________________ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3928467.html Sent from the R help mailing list archive at Nabble.com.
Dr. Harrell, Thanks for your response. The predictor variables I initially included in the model were based on the x mean plots and whether they exhibited ordinality and whether they appeared to meet the CR assumptions. Only 7 of 16 potential variables fit that designation and those are the variables I initially included. I then used backward variable selection, which selected 3 significant terms. Does that seem reasonable? Also, are you saying that if the exceedence probabilites for the middle Y category have a wide range then keeping the model as is would be fine for future predictions? Thanks for your time, Adam -- View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3930088.html Sent from the R help mailing list archive at Nabble.com.
That's not reasonable for 2 reasons. First, selecting variables based on apparent assumption satisfaction is an unexplored technique. Second, you failed to account for variable selection during resampling validation. You will need to give the model all CANDIDATE variables and use the bw=TRUE option for validate() and calibrate() to get the right answer. You'll have to specify the stopping rule too. If there is a wide range of predicted probabilities then an Emax of 0.05 is less stressful. But the Emax is meaningless if you didn't repeat all modeling steps that used Y for each resampling iteration. Frank
apeer wrote:
Dr. Harrell, Thanks for your response. The predictor variables I initially included in the model were based on the x mean plots and whether they exhibited ordinality and whether they appeared to meet the CR assumptions. Only 7 of 16 potential variables fit that designation and those are the variables I initially included. I then used backward variable selection, which selected 3 significant terms. Does that seem reasonable? Also, are you saying that if the exceedence probabilites for the middle Y category have a wide range then keeping the model as is would be fine for future predictions? Thanks for your time, Adam
----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3930552.html Sent from the R help mailing list archive at Nabble.com.
I guess I must be misunderstanding the point of checking the ordinality assumptions prior to fitting a model. Are you saying that a response variable that does not behave in an ordinal fashion can still be included in the initial and final model? -- View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3930644.html Sent from the R help mailing list archive at Nabble.com.
You also did unaccounted for stepwise selection. Regarding the proportional odds assumption, if you assessed it correctly, something that is not operating proportionally would have to be associated with the outcome for at least one cutoff of Y, so you could say that you are doing reverse screening that will need to be accounted for in resampling. Frank
apeer wrote:
I guess I must be misunderstanding the point of checking the ordinality assumptions prior to fitting a model. Are you saying that a response variable that does not behave in an ordinal fashion can still be included in the initial and final model?
----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3931493.html Sent from the R help mailing list archive at Nabble.com.
Does your point about proportionality also hold for ordinality? In other words, if I have several X variables that do not behave in an ordinal fashion with Y, should I still include them in the full model? My understanding or perhaps misunderstanding of the ordinality assumption was that all X variables included in the model should behave in an ordinal fashion with Y. Is that not the case? -- View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3931594.html Sent from the R help mailing list archive at Nabble.com.
On Oct 23, 2011, at 7:37 PM, apeer wrote:
Does your point about proportionality also hold for ordinality? In other words, if I have several X variables that do not behave in an ordinal fashion with Y, should I still include them in the full model? My understanding or perhaps misunderstanding of the ordinality assumption was that all X variables included in the model should behave in an ordinal fashion with Y. Is that not the case?
Why should non-monotonic relationships be discarded? Are you implying they are impossible from a scientific perspective?
David Winsemius, MD West Hartford, CT
I'm not implying they should be discarded; however, at the same time I'm not certain I fully understand why we should check the ordinality assumption if in the end we're going to include predictors with which the response variable behaves in a non-ordinal fashion. -- View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3932863.html Sent from the R help mailing list archive at Nabble.com.
A few issues - Don't let the overall unimportance of a predictor make you worry about non-ordinality (e.g., when scale of plot.xmean.ordinaly has a low range on the y-axis). We frequently have to face the issue of using an imperfect model by fitting a few variables that don't exactly meet our assumptions, if the majority of variables do. One reason for this is that competing methods may fare worse. Another option is to fit a more flexible model such as the partial proportional odds model. I haven't implemented this in my packages. Another R package may do the job (but without model validation mechanisms provided by rms). Frank
apeer wrote:
I'm not implying they should be discarded; however, at the same time I'm not certain I fully understand why we should check the ordinality assumption if in the end we're going to include predictors with which the response variable behaves in a non-ordinal fashion.
----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3932923.html Sent from the R help mailing list archive at Nabble.com.
One last thing. At the outset of this discussion I provided the results of a
validation procedure on a model (see below). As discussed previously, the
model overall seems to fair well, with the exception of the slope. With
that in mind, is there a way to correct the coefficients of the model to
account for the corrected slope so that future predictions on a new data set
are more accurate? Or is that not recommended at all?
index.orig training test optimism index.corrected n
Dxy 0.9932 0.9940 0.9905 0.0035 0.9897 363
R2 0.9291 0.9364 0.9163 0.0202 0.9089 363
Intercept 0.0000 0.0000 0.0233 -0.0233 0.0233 363
Slope 1.0000 1.0000 0.7836 0.2164 0.7836 363
Emax 0.0000 0.0000 0.0582 0.0582 0.0582 363
D 0.9118 0.9190 0.8915 0.0275 0.8844 363
U -0.0110 -0.0110 0.0124 -0.0234 0.0124 363
Q 0.9228 0.9299 0.8791 0.0508 0.8720 363
B 0.0205 0.0172 0.0239 -0.0067 0.0272 363
--
View this message in context: http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3933121.html
Sent from the R help mailing list archive at Nabble.com.