Ridge Regression variable selection
Frank Harrell <f.harrell <at> vanderbilt.edu> writes:
Unlike L1 (lasso) regression or elastic net (mixture of L1 and L2), L2 norm regression (ridge regression) does not select variables. Selection of variables would not work properly, and it's unclear why you would want to omit "apparently" weak variables anyway. Frank
... and this was cross-posted from StackOverflow, where I said more or less the same thing about ridge regression (I didn't get into the "don't do variable selection" issue yet, I was waiting ...) http://stackoverflow.com/questions/14046569/ridge-regression-in-r For the other questions (what are the lambda values? What does the output mean?) I would suggest getting a copy of _Modern Applied Statistics in S_ [the book that the package, MASS, was written to accompany] and reading the relevant chapter.
maths123 wrote
I have a .txt file containing a dataset with 500 samples. There are 10
variables.
I am trying to perform variable selection using the ridge regression
method but I am very confused.
I have input the following:
diabetes10<-read.table("diabetes10.txt", header=TRUE)
diabetes10
library(MASS)
select(lm.ridge(y=diabetes10 ~ age+sex+bmi+map+tc , diabetes10,
lambda = seq(0,0.1,0.0001)))
First of all, i am confused about the lamda values,
Second of all, my output is:
modified HKB estimator is -1.334073e-29
modified L-W estimator is -5.610557e-28
smallest value of GCV at 1e-04
I have no idea what that is telling me and where I am supposed to work out
which variables have been selected.