An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090505/3b894685/attachment-0001.pl>
Stepwise logistic regression with significance testing - stepAIC
5 messages · Peter-Heinz Fox, Greg Snow, Dimitris Rizopoulos +2 more
There is not a meaningful alternative way since the way you propose is not meaningful. The Wald tests have some know problems even in the well defined cases. Both types of tests are designed to test a predefined hypothesis, not a conditional hypothesis on the stepwise procedure. It is best to use other approaches than stepwise selection (it has been shown to give biased results) such as the lasso. If you need to use stepwise, then you should bootstrap the entire selection process to get better estimates/standard errors. Frank Harrell's book and package go into more detail on this and provide some tools to help (as well as the other packages that can be used). Hope this helps,
Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Peter-Heinz Fox > Sent: Tuesday, May 05, 2009 8:02 AM > To: r-help at r-project.org > Subject: [R] Stepwise logistic regression with significance testing - > stepAIC > > Hello R-Users, > > I have one binary dependent variable and a set of independent variables > (glm(formula,?,family=?binomial?) ) and I am using the function stepAIC > (?MASS?) for choosing an optimal model. However I am not sure if > stepAIC considers significance properties like Likelihood ratio test > and Wald test (see example below). > > > y <- rbinom(30,1,0.4) > > x1 <- rnorm(30) > > x2 <- rnorm(30) > > x3 <- rnorm(30) > > xdata <- data.frame(x1,x2,x3) > > > > fit1 <- glm(y~ . ,family="binomial",data=xdata) > > stepAIC(fit1,trace=FALSE) > > Call:? glm(formula = y ~ x3, family = "binomial", data = xdata) > > Coefficients: > (Intercept)?????????? x3 > ??? -0.3556?????? 0.8404 > > Degrees of Freedom: 29 Total (i.e. Null);? 28 Residual > Null Deviance:????? 40.38 > Residual Deviance: 37.86??????? AIC: 41.86 > > > > fit <- glm( stepAIC(fit1,trace=FALSE)$formula? ,family="binomial") > > my.summ <- summary(fit) > > # Wald Test > > print(my.summ$coeff[,4]) > (Intercept)????????? x3 > ? 0.3609638?? 0.1395215 > > > > my.anova <- anova(fit,test="Chisq") > > #LR Test > > print(my.anova$P[2]) > [1] 0.1121783 > > > > Is there an alternative function or a possible way of checking if the > added variable and the new model are significant within the regression > steps? > > Thanks in advance for your help > > Regards > > Peter-Heinz Fox > > > > > [[alternative HTML version deleted]]
Greg Snow wrote:
There is not a meaningful alternative way since the way you propose is not meaningful. The Wald tests have some know problems even in the well defined cases. Both types of tests are designed to test a predefined hypothesis, not a conditional hypothesis on the stepwise procedure. It is best to use other approaches than stepwise selection (it has been shown to give biased results) such as the lasso. If you need to use stepwise, then you should bootstrap the entire selection process to get better estimates/standard errors.
For bootstrapping the stepAIC procedure you may have a look at package bootStepAIC. Best, Dimitris
Frank Harrell's book and package go into more detail on this and provide some tools to help (as well as the other packages that can be used). Hope this helps,
Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
Didn't a 2008 paper by Austin in J Clin Epidemiol show that bootstrapping was just as bad as backward stepwise regression for finding the true predictors? http://xrl.in/26em
Dimitris Rizopoulos-4 wrote:
Greg Snow wrote:
There is not a meaningful alternative way since the way you propose is not meaningful. The Wald tests have some know problems even in the well defined cases. Both types of tests are designed to test a predefined hypothesis, not a conditional hypothesis on the stepwise procedure. It is best to use other approaches than stepwise selection (it has been shown to give biased results) such as the lasso. If you need to use stepwise, then you should bootstrap the entire selection process to get better estimates/standard errors.
For bootstrapping the stepAIC procedure you may have a look at package bootStepAIC. Best, Dimitris
Frank Harrell's book and package go into more detail on this and provide some tools to help (as well as the other packages that can be used). Hope this helps,
-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
View this message in context: http://www.nabble.com/Stepwise-logistic-regression-with-significance-testing---stepAIC-tp23388859p23398154.html Sent from the R help mailing list archive at Nabble.com.
David Freedman wrote:
Didn't a 2008 paper by Austin in J Clin Epidemiol show that bootstrapping was just as bad as backward stepwise regression for finding the true predictors?
Yes Any variable selection without shrinkage is problematic. Frank
http://xrl.in/26em Dimitris Rizopoulos-4 wrote:
Greg Snow wrote:
There is not a meaningful alternative way since the way you propose is not meaningful. The Wald tests have some know problems even in the well defined cases. Both types of tests are designed to test a predefined hypothesis, not a conditional hypothesis on the stepwise procedure. It is best to use other approaches than stepwise selection (it has been shown to give biased results) such as the lasso. If you need to use stepwise, then you should bootstrap the entire selection process to get better estimates/standard errors.
For bootstrapping the stepAIC procedure you may have a look at package bootStepAIC. Best, Dimitris
Frank Harrell's book and package go into more detail on this and provide some tools to help (as well as the other packages that can be used). Hope this helps,
-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University