glmnet() vs. lars()

2 messages · vito muggeo, Patrick Breheny

Wed, Mar 21, 2012 3:30 AM #

dear all,

It appears that glmnet(), when "selecting" the covariates entering the 
model, skips from K covariates, say, to K+2 or K+3. Thus 2 or 3 
variables are "added" at the same time and it is not possible to obtain 
a ranking of the covariates according to their importance in the model. 
On the other hand lars() "adds" the covariates one at a time.
My question is: is it possible to obtain a similar output of lars (in 
terms of order of the variables entering the model) using glmnet()?

many thanks,
vito


#Example (from ?glmnet)

set.seed(123)
x=matrix(rnorm(100*20),100,20)
y=rnorm(100)
fit1=glmnet(x,y)
fit1$df #no. of covariates entering the model at different lambdas

#Thus in the "first" model no covariate is included and in the second 
#one 2 covariates (V8 and V20) are included at the same time. Because 
#two variables are included at the same time I do not know which 
#variable (among the selected ones) is more important.
#Everything is fine with lars

o<-lars(x,y)
o$df #the covariates enter one at a time.. V8 is "better" than V20

====================================
Vito M.R. Muggeo
Dip.to Sc Statist e Matem `Vianelli'
Universit? di Palermo
viale delle Scienze, edificio 13
90128 Palermo - ITALY
tel: 091 23895240
fax: 091 485726
http://dssm.unipa.it/vmuggeo

Patrick Breheny

Wed, Mar 21, 2012 7:26 AM #

On 03/21/2012 06:30 AM, Vito Muggeo (UniPa) wrote:

glmnet() is based on an iterative coordinate descent algorithm applied 
to a grid of lambda values; LARS is a more elegant algorithm and 
computes exact solutions.  You can get your glmnet solutions to have 
higher resolution (more "exact") by using a finer grid.  In your example:

[1]  0  2  4  4 ...

The default is a grid of 100 lambda values.  If we use 300 values, the 
resolution is finer and we can see the variables enter one at a time:

 > fit1=glmnet(x,y,nlambda=300)
 > fit1$df
   [1]  0  1  1  2  3  3  4  ...

However, it is impossible to know in advance how fine the grid must be 
in order to ensure that only one variable enters the model between any 
two consecutive lambda values.

Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky