We are workin on building a logistic regression using 1. We are doing a logistic regression with binary outcome variable using a set of predictors that include 8 continuous and 8 category predictors 2. We are trying to implement interaction between two variables (continuous and category or just continuous) The dataset is 200,000 rows and we are using glmnet, how can we model those two points ? Also how can we obtain the aic of the model ? Thanks
Glmnet Logistic Variable Questions
6 messages · Bert Gunter, Marc Schwartz, julien giami +1 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111025/2efb0a23/attachment.pl>
Bert Gunter <gunter.berton <at> gene.com> writes:
If I understand you correctly, it sounds like you need to do some reading. ?lm and ?formula tell you how to specify linear models for glm or glmnet. However, if you do not have sufficient statistical background, It probably will be incomprehensible, in which case you should consult your local statistician. For glmnet, go to the linked references given in the Help file.There is no such thing as AIC for these models, as they are penalized fits (with users choosing the penalization tradeoff). Again, consult your local statistician
Let me second Bert's concern, but in the meantime, if what you want are *all two-way interactions among variables, you can follow this example:
d <- data.frame(y=runif(100),x1=runif(100),x2=runif(100),x3=runif(100)) gg <- lm(y~(.)^2,data=d) names(coef(gg))
[1] "(Intercept)" "x1" "x2" "x3" "x1:x2" [6] "x1:x3" "x2:x3" I have done the example with continuous variables and with lm() here, but it should generalize easily to (1) a mixture of categorical and continuous variables and (2) other R modeling functions
On Oct 25, 2011, at 11:16 AM, Ben Bolker wrote:
Bert Gunter <gunter.berton <at> gene.com> writes:
If I understand you correctly, it sounds like you need to do some reading. ?lm and ?formula tell you how to specify linear models for glm or glmnet. However, if you do not have sufficient statistical background, It probably will be incomprehensible, in which case you should consult your local statistician. For glmnet, go to the linked references given in the Help file.There is no such thing as AIC for these models, as they are penalized fits (with users choosing the penalization tradeoff). Again, consult your local statistician
Let me second Bert's concern, but in the meantime, if what you want are *all two-way interactions among variables, you can follow this example:
d <- data.frame(y=runif(100),x1=runif(100),x2=runif(100),x3=runif(100)) gg <- lm(y~(.)^2,data=d) names(coef(gg))
[1] "(Intercept)" "x1" "x2" "x3" "x1:x2" [6] "x1:x3" "x2:x3" I have done the example with continuous variables and with lm() here, but it should generalize easily to (1) a mixture of categorical and continuous variables and (2) other R modeling functions
There is a difference with glmnet however vis-?-vis its handling of factors. There is a recent discussion here: https://stat.ethz.ch/pipermail/r-help/2011-August/285905.html which covers the topic. Be sure to read the replies, including Martin's. HTH, Marc Schwartz
The reason i use glmnet is that it makes the handling of 400,000 observations easier to handle in terms of memory, I am looking on sparse matrices but i dont understand how to build interacting using sparse matrices
On Tue, Oct 25, 2011 at 12:34 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
On Oct 25, 2011, at 11:16 AM, Ben Bolker wrote:
Bert Gunter <gunter.berton <at> gene.com> writes:
If I understand you correctly, it sounds like you need to do some reading. ?lm and ?formula tell you how to specify linear models for glm or glmnet. However, if you do not have sufficient statistical background, It probably will be incomprehensible, in which case you should consult your local statistician. For glmnet, go to the linked references given in the Help file.There is no such thing as AIC for these models, as they are penalized fits (with users choosing the penalization tradeoff). Again, consult your local statistician
?Let me second Bert's concern, but in the meantime, if what you want are *all two-way interactions among variables, you can follow this example:
d <- data.frame(y=runif(100),x1=runif(100),x2=runif(100),x3=runif(100)) gg <- lm(y~(.)^2,data=d) names(coef(gg))
[1] "(Intercept)" "x1" ? ? ? ? ?"x2" ? ? ? ? ?"x3" ? ? ? ? ?"x1:x2" [6] "x1:x3" ? ? ? "x2:x3" I have done the example with continuous variables and with lm() here, but it should generalize easily to (1) a mixture of categorical and continuous variables and (2) other R modeling functions
There is a difference with glmnet however vis-?-vis its handling of factors. There is a recent discussion here: ?https://stat.ethz.ch/pipermail/r-help/2011-August/285905.html which covers the topic. Be sure to read the replies, including Martin's. HTH, Marc Schwartz
On 11-10-25 01:35 PM, julien giami wrote:
The reason i use glmnet is that it makes the handling of 400,000 observations easier to handle in terms of memory, I am looking on sparse matrices but i dont understand how to build interacting using sparse matrices
If you're not familiar with glmnet but you are familiar with GLMs in general may I suggest bigglm() in the biglm package?
On Tue, Oct 25, 2011 at 12:34 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
On Oct 25, 2011, at 11:16 AM, Ben Bolker wrote:
Bert Gunter <gunter.berton <at> gene.com> writes:
If I understand you correctly, it sounds like you need to do some reading. ?lm and ?formula tell you how to specify linear models for glm or glmnet. However, if you do not have sufficient statistical background, It probably will be incomprehensible, in which case you should consult your local statistician. For glmnet, go to the linked references given in the Help file.There is no such thing as AIC for these models, as they are penalized fits (with users choosing the penalization tradeoff). Again, consult your local statistician
Let me second Bert's concern, but in the meantime, if what you want are *all two-way interactions among variables, you can follow this example:
d <- data.frame(y=runif(100),x1=runif(100),x2=runif(100),x3=runif(100)) gg <- lm(y~(.)^2,data=d) names(coef(gg))
[1] "(Intercept)" "x1" "x2" "x3" "x1:x2" [6] "x1:x3" "x2:x3" I have done the example with continuous variables and with lm() here, but it should generalize easily to (1) a mixture of categorical and continuous variables and (2) other R modeling functions
There is a difference with glmnet however vis-?-vis its handling of factors. There is a recent discussion here: https://stat.ethz.ch/pipermail/r-help/2011-August/285905.html which covers the topic. Be sure to read the replies, including Martin's. HTH, Marc Schwartz