Skip to content

glmnet() vs. lars()

2 messages · vito muggeo, Patrick Breheny

#
dear all,

It appears that glmnet(), when "selecting" the covariates entering the 
model, skips from K covariates, say, to K+2 or K+3. Thus 2 or 3 
variables are "added" at the same time and it is not possible to obtain 
a ranking of the covariates according to their importance in the model. 
On the other hand lars() "adds" the covariates one at a time.
My question is: is it possible to obtain a similar output of lars (in 
terms of order of the variables entering the model) using glmnet()?

many thanks,
vito


#Example (from ?glmnet)

set.seed(123)
x=matrix(rnorm(100*20),100,20)
y=rnorm(100)
fit1=glmnet(x,y)
fit1$df #no. of covariates entering the model at different lambdas

#Thus in the "first" model no covariate is included and in the second 
#one 2 covariates (V8 and V20) are included at the same time. Because 
#two variables are included at the same time I do not know which 
#variable (among the selected ones) is more important.
#Everything is fine with lars

o<-lars(x,y)
o$df #the covariates enter one at a time.. V8 is "better" than V20
#
On 03/21/2012 06:30 AM, Vito Muggeo (UniPa) wrote:
glmnet() is based on an iterative coordinate descent algorithm applied 
to a grid of lambda values; LARS is a more elegant algorithm and 
computes exact solutions.  You can get your glmnet solutions to have 
higher resolution (more "exact") by using a finer grid.  In your example:
[1]  0  2  4  4 ...

The default is a grid of 100 lambda values.  If we use 300 values, the 
resolution is finer and we can see the variables enter one at a time:

 > fit1=glmnet(x,y,nlambda=300)
 > fit1$df
   [1]  0  1  1  2  3  3  4  ...

However, it is impossible to know in advance how fine the grid must be 
in order to ensure that only one variable enters the model between any 
two consecutive lambda values.