subset selection for logistic regression
To clarify Frank's remark ... A prominent theme in statistical research over at least the last 25 years (with roots that go back 50 or more, probably) has been the superiority of "shrinkage" methods over variable selection. I also find it distressing that these ideas have apparently not penetrated much (at all?) into the wider scientific community (but I suppose I shouldn't be surprised -- most scientists still do one factor at a time experiments 80 years after Fisher). Specific incarnations can be found in anything Bayesian, mixed effects models for repeated measures, ridge regression, and the R packages lars and lasso, among others. I would speculate that aside from the usual statistics/science cultural issues, part of the reason for this is that the estimators don't generally come with neat, classical inference procedures: like it or not, many scientists have been conditioned by their Stat 101 courses to expect P values, so in some sense, we are hoisted by our own petard. Just my $.02 -- contrary(and more knowledgeable) opinions welcome. -- Bert Gunter
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Frank E Harrell Jr Sent: Wednesday, March 02, 2005 5:13 AM To: Wittner, Ben Cc: r-help at lists.R-project.org Subject: Re: [R] subset selection for logistic regression Wittner, Ben wrote:
R-packages leaps and subselect implement various methods of
selecting best or
good subsets of predictor variables for linear regression
models, but they do
not seem to be applicable to logistic regression models. Does anyone know of software for finding good subsets of
predictor variables for
linear regression models? Thanks. -Ben
Why are these procedures still being used? The performance is known to be bad in almost every sense (see r-help archives). Frank Harrell
p.s., The leaps package references "Subset Selection in
Regression" by Alan
Miller. On page 2 of the 2nd edition of that text it states the following: "All of the models which will be considered in this
monograph will be linear;
that is they will be linear in the regression coefficients.Though
most of the ideas and
problems carry over to the fitting of nonlinear models and generalized
linear models
(particularly the fitting of logistic relationships), the complexity is greatly increased."
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics
Vanderbilt University
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html