Glmnet Logistic Variable Questions
On 11-10-25 01:35 PM, julien giami wrote:
The reason i use glmnet is that it makes the handling of 400,000 observations easier to handle in terms of memory, I am looking on sparse matrices but i dont understand how to build interacting using sparse matrices
If you're not familiar with glmnet but you are familiar with GLMs in general may I suggest bigglm() in the biglm package?
On Tue, Oct 25, 2011 at 12:34 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
On Oct 25, 2011, at 11:16 AM, Ben Bolker wrote:
Bert Gunter <gunter.berton <at> gene.com> writes:
If I understand you correctly, it sounds like you need to do some reading. ?lm and ?formula tell you how to specify linear models for glm or glmnet. However, if you do not have sufficient statistical background, It probably will be incomprehensible, in which case you should consult your local statistician. For glmnet, go to the linked references given in the Help file.There is no such thing as AIC for these models, as they are penalized fits (with users choosing the penalization tradeoff). Again, consult your local statistician
Let me second Bert's concern, but in the meantime, if what you want are *all two-way interactions among variables, you can follow this example:
d <- data.frame(y=runif(100),x1=runif(100),x2=runif(100),x3=runif(100)) gg <- lm(y~(.)^2,data=d) names(coef(gg))
[1] "(Intercept)" "x1" "x2" "x3" "x1:x2" [6] "x1:x3" "x2:x3" I have done the example with continuous variables and with lm() here, but it should generalize easily to (1) a mixture of categorical and continuous variables and (2) other R modeling functions
There is a difference with glmnet however vis-?-vis its handling of factors. There is a recent discussion here: https://stat.ethz.ch/pipermail/r-help/2011-August/285905.html which covers the topic. Be sure to read the replies, including Martin's. HTH, Marc Schwartz