Skip to content
Prev 65307 / 398525 Next

subset selection for logistic regression

Christian Hennig wrote:
Yes.  Read about model uncertainty and bias in models developed using 
stepwise methods.  One exception: if there is a large number of 
variables with truly zero regression coefficients, and the rest are not 
very weak, stepwise can sort things out fairly well.  But you never know 
this in advance.
That's a good way to find out how bad the method is, not to fix the 
problems inherent in it.
Stress on "if".  And note that if you ask what is the optimum alpha for 
variables to be kept in the model when doing backwards stepdown, it's 
alpha=1.0.  A good compromise is alpha=0.5.  See

@Article{ste01pro,
   author = 		 {Steyerberg, Ewout W. and Eijkemans, Marinus
   J. C. and Harrell, Frank E. and Habbema, J. Dik F.},
   title = 		 {Prognostic modeling with logistic regression
   analysis: {In} search of a sensible strategy in small data sets},
   journal = 	 Medical Decision Making,
   year = 		 2001,
   volume =		 21,
   pages =		 {45-56},
   annote =		 {shrinkage; variable selection; dichotomization of
   continuous varibles; sign of regression coefficient; calibration; 
validation}
}

And on Bert's excellent question about why shrinkage is not used more 
often, here is our attempt at a remedy:

@Article{moo04pen,
   author = 		 {Moons, K. G. M. and Donders, A. Rogier T. and
Steyerberg, E. W. and Harrell, F. E.},
   title = 		 {Penalized maximum likelihood estimation to directly
adjust diagnostic and prognostic prediction models for overoptimism: a
clinical example},
   journal = 	 J Clinical Epidemiology,
   year = 		 2004,
   volume =		 57,
   pages =		 {1262-1270},
   annote =		 {prediction 
research;overoptimism;overfitting;penalization;bootstrapping;shrinkage}
}

Frank