I am developing a short presentation for people with applied statistical backgrounds who have used backward stepwise variable selection where they remove variables based on small coefficient values, coefficient P values > 0.05, and large variances. I am wanting to provide some demonstration code in R that highlights some of the weakness as described by Frank Harrell (citation below). Of particular interest are (1) failure to include informative predictor variables (categorical and continuous) and (2) lowered standard errors for the coefficients in the final model. I have code to demonstrate inclusion of too many false predictors. I expect this code is available, but I have not found it. Guidance would be appreciated. Mark P.S. I have started a public github package at https://github.com/rmsharp/stepwiser I has very little in it thus far. Frank E. Harrell. Regression Modeling Strategies with applications to linear models, logistic regression, and survival analysis, Springer Series in Statistics. Springer-Verlag. 2015. R. Mark Sharp, Ph.D. Data Scientist and Biomedical Statistical Consultant 7526 Meadow Green St. San Antonio, TX 78251 mobile: 210-218-2868 rmsharp at me.com
demonstration of weaknesses in stepwise variable selection
4 messages · Mark Sharp, Jeff Laux, K Imran M +1 more
Although no code is given, it can be inferred from this:
https://stats.stackexchange.com/a/179945/
Best, Jeff
On 10/2/2018 11:54 AM, R. Mark Sharp wrote:
I am developing a short presentation for people with applied statistical backgrounds who have used backward stepwise variable selection where they remove variables based on small coefficient values, coefficient P values > 0.05, and large variances. I am wanting to provide some demonstration code in R that highlights some of the weakness as described by Frank Harrell (citation below). Of particular interest are (1) failure to include informative predictor variables (categorical and continuous) and (2) lowered standard errors for the coefficients in the final model. I have code to demonstrate inclusion of too many false predictors. I expect this code is available, but I have not found it. Guidance would be appreciated. Mark P.S. I have started a public github package at https://github.com/rmsharp/stepwiser I has very little in it thus far. Frank E. Harrell. Regression Modeling Strategies with applications to linear models, logistic regression, and survival analysis, Springer Series in Statistics. Springer-Verlag. 2015. R. Mark Sharp, Ph.D. Data Scientist and Biomedical Statistical Consultant 7526 Meadow Green St. San Antonio, TX 78251 mobile: 210-218-2868 rmsharp at me.com
_______________________________________________ R-sig-teaching at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching --- This email has been checked for viruses by AVG. https://www.avg.com
Looking forward to it ... KIM Malaysia
On Wed, Oct 3, 2018 at 10:32 AM Jeff Laux <jefflaux at gmail.com> wrote:
Although no code is given, it can be inferred from this:
https://stats.stackexchange.com/a/179945/
Best, Jeff
On 10/2/2018 11:54 AM, R. Mark Sharp wrote:
I am developing a short presentation for people with applied statistical
backgrounds who have used backward stepwise variable selection where they remove variables based on small coefficient values, coefficient P values > 0.05, and large variances.
I am wanting to provide some demonstration code in R that highlights
some of the weakness as described by Frank Harrell (citation below).
Of particular interest are (1) failure to include informative predictor
variables (categorical and continuous) and (2) lowered standard errors for the coefficients in the final model. I have code to demonstrate inclusion of too many false predictors.
I expect this code is available, but I have not found it. Guidance would
be appreciated.
Mark P.S. I have started a public github package at
I has very little in it thus far. Frank E. Harrell. Regression Modeling Strategies with applications to
linear models, logistic regression, and survival analysis, Springer Series in Statistics. Springer-Verlag. 2015.
R. Mark Sharp, Ph.D. Data Scientist and Biomedical Statistical Consultant 7526 Meadow Green St. San Antonio, TX 78251 mobile: 210-218-2868 rmsharp at me.com
_______________________________________________ R-sig-teaching at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching --- This email has been checked for viruses by AVG. https://www.avg.com
_______________________________________________ R-sig-teaching at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
Perhaps it is much easier to show by means of a simulation that using p-value based selection causes upward bias in regression coefficient estimates as the ones that are relatively low are less likely to be included. Ewout Steyerberg has performed many of these simulations, perhaps some code of it is available on his website: http://www.clinicalpredictionmodels.org/doku.php?id=rcode_and_data:start Kind regards, Sander
On Tue, Oct 2, 2018 at 5:54 PM R. Mark Sharp <rmsharp at me.com> wrote:
I am developing a short presentation for people with applied statistical backgrounds who have used backward stepwise variable selection where they remove variables based on small coefficient values, coefficient P values > 0.05, and large variances. I am wanting to provide some demonstration code in R that highlights some of the weakness as described by Frank Harrell (citation below). Of particular interest are (1) failure to include informative predictor variables (categorical and continuous) and (2) lowered standard errors for the coefficients in the final model. I have code to demonstrate inclusion of too many false predictors. I expect this code is available, but I have not found it. Guidance would be appreciated. Mark P.S. I have started a public github package at https://github.com/rmsharp/stepwiser I has very little in it thus far. Frank E. Harrell. Regression Modeling Strategies with applications to linear models, logistic regression, and survival analysis, Springer Series in Statistics. Springer-Verlag. 2015. R. Mark Sharp, Ph.D. Data Scientist and Biomedical Statistical Consultant 7526 Meadow Green St. San Antonio, TX 78251 mobile: 210-218-2868 rmsharp at me.com
_______________________________________________ R-sig-teaching at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching