Stepwise logistic regression with significance testing - stepAIC

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090505/3b894685/attachment-0001.pl>
There is not a meaningful alternative way since the way you propose is not meaningful.  The Wald tests have some know problems even in the well defined cases.  Both types of tests are designed to test a predefined hypothesis, not a conditional hypothesis on the stepwise procedure.  It is best to use other approaches than stepwise selection (it has been shown to give biased results) such as the lasso.  If you need to use stepwise, then you should bootstrap the entire selection process to get better estimates/standard errors.  

Frank Harrell's book and package go into more detail on this and provide some tools to help (as well as the other packages that can be used).

Hope this helps,
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Peter-Heinz Fox
> Sent: Tuesday, May 05, 2009 8:02 AM
> To: r-help at r-project.org
> Subject: [R] Stepwise logistic regression with significance testing -
> stepAIC
> 
> Hello R-Users,
> 
> I have one binary dependent variable and a set of independent variables
> (glm(formula,?,family=?binomial?) ) and I am using the function stepAIC
> (?MASS?) for choosing an optimal model. However I am not sure if
> stepAIC considers significance properties like Likelihood ratio test
> and Wald test (see example below).
> 
> > y <- rbinom(30,1,0.4)
> > x1 <- rnorm(30)
> > x2 <- rnorm(30)
> > x3 <- rnorm(30)
> > xdata <- data.frame(x1,x2,x3)
> >
> > fit1 <- glm(y~ . ,family="binomial",data=xdata)
> > stepAIC(fit1,trace=FALSE)
> 
> Call:? glm(formula = y ~ x3, family = "binomial", data = xdata)
> 
> Coefficients:
> (Intercept)?????????? x3
> ??? -0.3556?????? 0.8404
> 
> Degrees of Freedom: 29 Total (i.e. Null);? 28 Residual
> Null Deviance:????? 40.38
> Residual Deviance: 37.86??????? AIC: 41.86
> >
> > fit <- glm( stepAIC(fit1,trace=FALSE)$formula? ,family="binomial")
> > my.summ <- summary(fit)
> > # Wald Test
> > print(my.summ$coeff[,4])
> (Intercept)????????? x3
> ? 0.3609638?? 0.1395215
> >
> > my.anova <- anova(fit,test="Chisq")
> > #LR Test
> > print(my.anova$P[2])
> [1] 0.1121783
> >
> 
> Is there an alternative function or a possible way of checking if the
> added variable and the new model are significant within the regression
> steps?
> 
> Thanks in advance for your help
> 
> Regards
> 
> Peter-Heinz Fox
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
There is not a meaningful alternative way since the way you propose is not meaningful.  The Wald tests have some know problems even in the well defined cases.  Both types of tests are designed to test a predefined hypothesis, not a conditional hypothesis on the stepwise procedure.  It is best to use other approaches than stepwise selection (it has been shown to give biased results) such as the lasso.  If you need to use stepwise, then you should bootstrap the entire selection process to get better estimates/standard errors.  
For bootstrapping the stepAIC procedure you may have a look at package 
bootStepAIC.

Best,
Dimitris
Frank Harrell's book and package go into more detail on this and provide some tools to help (as well as the other packages that can be used).

Hope this helps,

Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Didn't a 2008 paper by Austin in J Clin Epidemiol show that bootstrapping was
just as bad as backward stepwise regression for finding the true predictors?

http://xrl.in/26em
Greg Snow wrote:
There is not a meaningful alternative way since the way you propose is
not meaningful.  The Wald tests have some know problems even in the well
defined cases.  Both types of tests are designed to test a predefined
hypothesis, not a conditional hypothesis on the stepwise procedure.  It
is best to use other approaches than stepwise selection (it has been
shown to give biased results) such as the lasso.  If you need to use
stepwise, then you should bootstrap the entire selection process to get
better estimates/standard errors.  
For bootstrapping the stepAIC procedure you may have a look at package 
bootStepAIC.

Best,
Dimitris

Frank Harrell's book and package go into more detail on this and provide
some tools to help (as well as the other packages that can be used).

Hope this helps,

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

View this message in context: http://www.nabble.com/Stepwise-logistic-regression-with-significance-testing---stepAIC-tp23388859p23398154.html
Sent from the R help mailing list archive at Nabble.com.
Didn't a 2008 paper by Austin in J Clin Epidemiol show that bootstrapping was
just as bad as backward stepwise regression for finding the true predictors?
Yes

Any variable selection without shrinkage is problematic.
Frank
http://xrl.in/26em

Dimitris Rizopoulos-4 wrote:
Greg Snow wrote:
There is not a meaningful alternative way since the way you propose is
not meaningful.  The Wald tests have some know problems even in the well
defined cases.  Both types of tests are designed to test a predefined
hypothesis, not a conditional hypothesis on the stepwise procedure.  It
is best to use other approaches than stepwise selection (it has been
shown to give biased results) such as the lasso.  If you need to use
stepwise, then you should bootstrap the entire selection process to get
better estimates/standard errors.  
For bootstrapping the stepAIC procedure you may have a look at package 
bootStepAIC.

Best,
Dimitris

Frank Harrell's book and package go into more detail on this and provide
some tools to help (as well as the other packages that can be used).

Hope this helps,

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University