Skip to content
Prev 258653 / 398503 Next

Lasso with Categorical Variables

On May 2, 2011, at 10:51 AM, Steve Lianoglou wrote:

            
Steve's citation is somewhat helpful, but not sufficient to take the  
next steps. You can find details regarding the mechanics of typical  
linear regression in R on the ?lm page where you find that the factor  
variables are typically handled by model.matrix. See below:

 > model.matrix(~X1 + X2 + X3 + X4, X)
    (Intercept) X1B X1C X1D X2F X2G X2H X2I X3        X4
1            1   0   0   1   0   1   0   0 51 2.8640884
2            1   0   0   0   0   0   1   0 46 1.5462243
3            1   0   1   0   0   1   0   0 50 1.9430901
4            1   0   0   0   1   0   0   0 44 2.4504180
5            1   1   0   0   0   0   0   1 43 2.7535052
6            1   1   0   0   0   0   0   1 50 1.6200326
7            1   0   0   0   0   0   0   1 30 0.5750533
8            1   1   0   0   0   0   0   0 42 5.9224777
9            1   0   0   1   0   0   0   1 49 2.0401528
10           1   1   0   0   0   1   0   0 48 6.2995288
attr(,"assign")
  [1] 0 1 1 1 2 2 2 2 3 4
attr(,"contrasts")
attr(,"contrasts")$X1
[1] "contr.treatment"

attr(,"contrasts")$X2
[1] "contr.treatment"

The numeric variables are passed through, while the dummy variables  
for factor columns are constructed (as treatment contrasts) and the  
whole thing it returned in a neat package.

-- 
David.