An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110809/65135bcf/attachment.ksh>
glmnet
3 messages · Andra Isan, Nick Sabbe, Patrick Breheny
Hi Andra. I wonder how you come about trying to use LASSO without knowing what lambda is. I'd advise you to read up on it. In the help (?glmnet) you can find several paper references, but for a more gentle introduction, you can read http://www-stat.stanford.edu/~tibs/ElemStatLearn/ In a nutshell, though: lambda is the parameter that balances the weight given to the penalty. The bigger this one is, the more 'pressure' there is on the coefficients to be small (or better yet: disappear). The way you use LASSO is: you look at a reasonable set of lambda values (this is e.g. done by glmnet), calculate some measure of success with each lambda value (e.g.: misclassification, AUC,...), generally by using crossvalidation (as is provided by cv.glmnet: read its help). Having this measure of success (say the AUC) for each lambda in your reasonable set allows you to pick the most optimal (lambda.min) or, to avoid happenstance peaks, a more conservative and parsimonious one (lambda.1se), after which you can rerun your lasso with this selected lambda on the full dataset, to find the variables in your model. Finally, to avoid downward bias, you could run a normal glm with only the variables selected in the previous step. Good luck! Nick Sabbe -- ping: nick.sabbe at ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- project.org] On Behalf Of Andra Isan Sent: woensdag 10 augustus 2011 5:59 To: r-help at r-project.org Subject: [R] glmnet Hi All, I have been trying to use glmnet package to do LASSO linear regression. my x data is a matrix n_row by n_col and y is a vector of size n_row corresponding to the vector data. The number of n_col is much more larger than the number of n_row. I do the following: fits = glmnet(x, y, family="multinomial")I have been following this article: http://cran.r-project.org/web/packages/glmnet/glmnet.pdfpage 8, but there are some unclear parts that I dont understand. The lambda variable only returns 100 and I exactly dont know what lambda represents. So, basically I would like to know how to get the coefficients weights and what exactly lambda is? how I can see the difference between predicted values and observed values? If there is a sample code that helps me to understand how to use these, that would be great. Thanks a lot,Andra [[alternative HTML version deleted]]
On 08/10/2011 03:00 AM, Nick Sabbe wrote:
Finally, to avoid downward bias, you could run a normal glm with only the variables selected in the previous step.
At the cost, of course, of introducing upward bias....
Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky