glmnet - R-help | R Mailing Lists

Tue, Aug 9, 2011 8:59 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110809/65135bcf/attachment.ksh>

Nick Sabbe

Wed, Aug 10, 2011 12:00 AM #

Hi Andra.

I wonder how you come about trying to use LASSO without knowing what lambda
is. I'd advise you to read up on it. In the help (?glmnet) you can find
several paper references, but for a more gentle introduction, you can read
http://www-stat.stanford.edu/~tibs/ElemStatLearn/

In a nutshell, though: lambda is the parameter that balances the weight
given to the penalty. The bigger this one is, the more 'pressure' there is
on the coefficients to be small (or better yet: disappear).
The way you use LASSO is: you look at a reasonable set of lambda values
(this is e.g. done by glmnet), calculate some measure of success with each
lambda value (e.g.: misclassification, AUC,...), generally by using
crossvalidation (as is provided by cv.glmnet: read its help).

Having this measure of success (say the AUC) for each lambda in your
reasonable set allows you to pick the most optimal (lambda.min) or, to avoid
happenstance peaks, a more conservative and parsimonious one (lambda.1se),
after which you can rerun your lasso with this selected lambda on the full
dataset, to find the variables in your model.

Finally, to avoid downward bias, you could run a normal glm with only the
variables selected in the previous step.

Good luck!


Nick Sabbe
--
ping: nick.sabbe at ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36

-- Do Not Disapprove

Patrick Breheny

Wed, Aug 10, 2011 5:08 AM #

On 08/10/2011 03:00 AM, Nick Sabbe wrote:

At the cost, of course, of introducing upward bias....

Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky