Stepwise Regression and PLS
On Sun, 1 Feb 2004 19:13:49 -0800 (PST)
Jinsong Zhao <jinsong_zh at yahoo.com> wrote:
--- Frank E Harrell Jr <feh3k at spamcop.net> wrote:
On Sun, 1 Feb 2004 11:09:28 -0800 (PST) Jinsong Zhao <jinsong_zh at yahoo.com> wrote:
Dear all, I am a newcomer to R. I intend to using R to do stepwise regression and PLS with a data set (a
55x20
matrix, with one dependent and 19 independent variable). Based on the same data set, I have done
the
same work using SPSS and SAS. However, there is
much
difference between the results obtained by R and
SPSS
or SAS. In the case of stepwise, SPSS gave out a model
with 4
independent variable, but with step(), R gave out
a
model with 10 and much higher R2. Furthermore, regsubsets() also indicate the 10 variable is one
of
the best regression subset. How to explain this difference? And in the case of my data set, how
many
variables that enter the model would be
reasonable?
In the case of PLS, the results of mvr function of pls.pcr package is also different with that of
SAS.
Although the number of optimum latent variables is same, the difference between R2 is much large.
Why?
Any comment and suggestion is very appreciated.
Thanks
in advance! Best wishes, Jinsong Zhao
In your case SPSS, SAS, R, S-Plus, Stata, Systat,
Statistica, and every
other package will agree in one sense, because
results from all of them
will be virtually meaningless. Simulate some data
from a known model and
you'll quickly find out why stepwise variable
selection is often a train
wreck.
---
Frank E Harrell Jr Professor and Chair
School of Medicine
Department of Biostatistics
Vanderbilt University
For the case of stepwise regression, I have found that the subsets I got using regsubsets() are collinear. However, the variables in SPSS's result are not collinear. I wonder what I should do to get a same or better linear model.
I think you missed the point. None of the variable selection procedures
will provide results that have a fair probability of replicating in
another sample.
FH
---
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University