An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20080904/c0981d82/attachment.pl>
Stepwise
4 messages · Williams, Robin, Peter Flom, Ben Bolker +1 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20080904/60496908/attachment.pl>
Peter Flom <peterf <at> brainscope.com> writes:
Robin Williams wrote <<<< Is there any facility in R to perform a stepwise process on a model, which will remove any highly-correlated explanatory variables? I am told there is in SPSS. I have a large number of variables (some correlated), which I would like to just chuck in to a model and perform stepwise and see what comes out the other end, to give me an idea perhaps as to which variables I should focus on. Thanks for any help / suggestions.
Stepwise is a bad method of selecting variables. Far better methods are LASSO
and LAR (least angle
regression), available in the LARS package and the LASSO2 package. However, while both these methods are good, neither is a substitute for
substantive knowledge.
Also, the key thing is not so much whether variables are correlated, but
whether they are co-linear, which
is different. If you have a great many variables, then you can have a high
degree of colinearity even with no
high pairwise correlations. I've not done this in R, but
RSiteSearch("collinearity", restrict = 'functions') yields 34 hits.
HTH
Peter
Another suggestion would be to do PCA on the predictor variables.
And to read Frank Harrell's book on _Regression modeling strategies_.
cheers
Ben Bolker
Also consider the redun function in the Hmisc package, which does not use the response variable but uses flexible nonlinear additive models to predict each predictor variable from all the others, using a stepwise procedure in a formal redundancy analysis. Frank
Ben Bolker wrote:
Peter Flom <peterf <at> brainscope.com> writes:
Robin Williams wrote <<<< Is there any facility in R to perform a stepwise process on a model, which will remove any highly-correlated explanatory variables? I am told there is in SPSS. I have a large number of variables (some correlated), which I would like to just chuck in to a model and perform stepwise and see what comes out the other end, to give me an idea perhaps as to which variables I should focus on. Thanks for any help / suggestions. Stepwise is a bad method of selecting variables. Far better methods are LASSO
and LAR (least angle
regression), available in the LARS package and the LASSO2 package. However, while both these methods are good, neither is a substitute for
substantive knowledge.
Also, the key thing is not so much whether variables are correlated, but
whether they are co-linear, which
is different. If you have a great many variables, then you can have a high
degree of colinearity even with no
high pairwise correlations. I've not done this in R, but
RSiteSearch("collinearity", restrict = 'functions') yields 34 hits.
HTH
Peter
Another suggestion would be to do PCA on the predictor variables.
And to read Frank Harrell's book on _Regression modeling strategies_.
cheers
Ben Bolker
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University