Skip to content
Prev 65278 / 398525 Next

subset selection for logistic regression

Perhaps I should not write it because I will discredit myself with this
but...

Suppose I have a setup with 100 variables and some 1000 cases and I want to
boil down the number of variables to a maximum of 10 for practical reasons
even if I lose 10% prediction quality by this (for example because it is
expensive to measure all variables on new cases).  

Is it really so wrong to use a stepwise method?
Let's say I divide the sample into three parts and do variable selction on
the first part, estimation on the second and test on the third part (this
solves almost all problems Frank is talking about on p. 56/57 in his
excellent book). Is there always a tractable alternative? 

Of course it is wrong to interpret the selected variables as "the true
influences" and all others as "unrelated", but if I don't do that?

If it should really be a taboo to do stepwise variable selection, why are p.
58/59 of "Regression Modeling Strategies" devoted to "how to do it of you
must"?

Please forget my name;-)

Christian
On Wed, 2 Mar 2005, Berton Gunter wrote:

            
***********************************************************************
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag-online.de