An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130419/2fc705bc/attachment.pl>
Multiple Multivariate regression in R with 50 independent variables
13 messages · Peter Dalgaard, David Winsemius, Nilesh Gupta +2 more
On Apr 18, 2013, at 21:24 , Nilesh Gupta wrote:
Hello all Is there a method/package in R in which I can do regressions for more than 50 independent variables ?
What's wrong with lm() et al.?
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130419/7a1b00b9/attachment.pl>
On Apr 19, 2013, at 12:40 AM, Nilesh Gupta wrote:
lm() does not accomodate more than 50 independent variables
What is your source for this misinformation?
dat <- as.data.frame(matrix(rnorm(51000), ncol=51) ) names(dat)
[1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11" "V12" "V13" "V14" "V15" "V16" "V17" "V18" [19] "V19" "V20" "V21" "V22" "V23" "V24" "V25" "V26" "V27" "V28" "V29" "V30" "V31" "V32" "V33" "V34" "V35" "V36" [37] "V37" "V38" "V39" "V40" "V41" "V42" "V43" "V44" "V45" "V46" "V47" "V48" "V49" "V50" "V51"
lm(V1 ~ ., dat=dat)
Call:
lm(formula = V1 ~ ., data = dat)
Coefficients:
(Intercept) V2 V3 V4 V5 V6 V7 V8
-0.0089517 -0.0427225 -0.0754946 -0.0002903 -0.0083482 0.0324383 -0.0194980 -0.0151008
V9 V10 V11 V12 V13 V14 V15 V16
0.0255324 -0.0167399 0.0476841 -0.0222229 0.0720990 -0.0174327 -0.0104261 0.0024625
V17 V18 V19 V20 V21 V22 V23 V24
-0.0086276 -0.0274867 -0.0345897 0.0209116 0.0368201 -0.0027364 0.0090916 0.0198854
V25 V26 V27 V28 V29 V30 V31 V32
-0.0083732 -0.0216937 0.0586361 -0.0530041 0.0402765 0.0073514 0.0295976 -0.0641553
V33 V34 V35 V36 V37 V38 V39 V40
0.0491071 -0.0261259 0.0364740 0.0070261 -0.0159851 -0.0373357 0.0506756 -0.0383495
V41 V42 V43 V44 V45 V46 V47 V48
0.0054945 0.0089468 -0.0050151 -0.0184369 0.0019926 -0.0177631 0.0282828 0.0353523
V49 V50 V51
-0.0382634 0.0545654 0.0101398
dat <- as.data.frame(matrix(rnorm(101000), ncol=101) ) lm(V1 ~ ., dat=dat)
Call:
lm(formula = V1 ~ ., data = dat)
Coefficients:
(Intercept) V2 V3 V4 V5 V6 V7 V8
0.021065 -0.015988 -0.008273 0.049849 0.014874 0.012352 -0.054584 0.004542
V9 V10 V11 V12 V13 V14 V15 V16
-0.017186 0.018006 -0.009707 -0.007382 0.044886 -0.051122 -0.026910 -0.048929
V17 V18 V19 V20 V21 V22 V23 V24
-0.008129 0.022129 -0.063525 0.026683 0.013424 -0.010145 -0.046046 0.024025
V25 V26 V27 V28 V29 V30 V31 V32
-0.003529 -0.038270 0.043657 0.049855 0.010691 0.041217 -0.012596 0.018302
V33 V34 V35 V36 V37 V38 V39 V40
0.040225 -0.012751 -0.062677 -0.002810 -0.002574 -0.024137 0.021324 -0.041520
V41 V42 V43 V44 V45 V46 V47 V48
-0.076482 0.009063 0.067097 -0.042554 -0.013789 0.002865 0.017325 -0.076860
V49 V50 V51 V52 V53 V54 V55 V56
-0.007003 -0.007315 0.030270 0.022066 -0.002224 -0.056534 0.013705 -0.003609
V57 V58 V59 V60 V61 V62 V63 V64
-0.044580 -0.037543 0.015745 0.035250 -0.017117 0.072470 0.004398 -0.015923
V65 V66 V67 V68 V69 V70 V71 V72
0.012864 -0.062752 -0.038437 -0.019586 0.019871 -0.068398 -0.111778 0.021416
V73 V74 V75 V76 V77 V78 V79 V80
0.036849 -0.009103 0.037790 0.021883 -0.034990 -0.014917 -0.003854 0.001760
V81 V82 V83 V84 V85 V86 V87 V88
-0.001812 0.003942 0.021810 -0.013984 -0.030446 0.049187 0.008392 0.026965
V89 V90 V91 V92 V93 V94 V95 V96
0.057301 0.004190 0.055505 -0.046006 -0.019080 -0.098889 -0.010891 -0.002729
V97 V98 V99 V100 V101
0.024939 -0.029847 0.063578 -0.061667 -0.022163
system.time( lm(V1 ~ ., dat=dat) ) # with the 101 column dataframe
user system elapsed 0.060 0.008 0.076 Sorry to give you such a Frost-y reception, but you are being somewhat ... what's the right word... sleepy?
David. > > The woods are lovely, dark and deep > But I have promises to keep > And miles before I go to sleep > And miles before I go to sleep > ----- > > > On Fri, Apr 19, 2013 at 12:26 PM, peter dalgaard <pdalgd at gmail.com> wrote: > >> >> On Apr 18, 2013, at 21:24 , Nilesh Gupta wrote: >> >>> Hello all >>> >>> Is there a method/package in R in which I can do regressions for more >> than >>> 50 independent variables ? >> >> What's wrong with lm() et al.? >> >> -- David Winsemius Alameda, CA, USA
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130419/6cd9f463/attachment.pl>
But that's not how to specify a multivariate regression. It's a univariate regression with a huge sum on the left hand side.
On Apr 19, 2013, at 13:51 , Nilesh Gupta wrote:
I used this link http://r.789695.n4.nabble.com/model-frame-and-formula-mismatch-in-model-matrix-td4664093.html Regards The woods are lovely, dark and deep But I have promises to keep And miles before I go to sleep And miles before I go to sleep ----- On Fri, Apr 19, 2013 at 2:19 PM, David Winsemius <dwinsemius at comcast.net> wrote: On Apr 19, 2013, at 12:40 AM, Nilesh Gupta wrote:
lm() does not accomodate more than 50 independent variables
What is your source for this misinformation?
dat <- as.data.frame(matrix(rnorm(51000), ncol=51) ) names(dat)
[1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11" "V12" "V13" "V14" "V15" "V16" "V17" "V18" [19] "V19" "V20" "V21" "V22" "V23" "V24" "V25" "V26" "V27" "V28" "V29" "V30" "V31" "V32" "V33" "V34" "V35" "V36" [37] "V37" "V38" "V39" "V40" "V41" "V42" "V43" "V44" "V45" "V46" "V47" "V48" "V49" "V50" "V51"
lm(V1 ~ ., dat=dat)
Call:
lm(formula = V1 ~ ., data = dat)
Coefficients:
(Intercept) V2 V3 V4 V5 V6 V7 V8
-0.0089517 -0.0427225 -0.0754946 -0.0002903 -0.0083482 0.0324383 -0.0194980 -0.0151008
V9 V10 V11 V12 V13 V14 V15 V16
0.0255324 -0.0167399 0.0476841 -0.0222229 0.0720990 -0.0174327 -0.0104261 0.0024625
V17 V18 V19 V20 V21 V22 V23 V24
-0.0086276 -0.0274867 -0.0345897 0.0209116 0.0368201 -0.0027364 0.0090916 0.0198854
V25 V26 V27 V28 V29 V30 V31 V32
-0.0083732 -0.0216937 0.0586361 -0.0530041 0.0402765 0.0073514 0.0295976 -0.0641553
V33 V34 V35 V36 V37 V38 V39 V40
0.0491071 -0.0261259 0.0364740 0.0070261 -0.0159851 -0.0373357 0.0506756 -0.0383495
V41 V42 V43 V44 V45 V46 V47 V48
0.0054945 0.0089468 -0.0050151 -0.0184369 0.0019926 -0.0177631 0.0282828 0.0353523
V49 V50 V51
-0.0382634 0.0545654 0.0101398
dat <- as.data.frame(matrix(rnorm(101000), ncol=101) ) lm(V1 ~ ., dat=dat)
Call:
lm(formula = V1 ~ ., data = dat)
Coefficients:
(Intercept) V2 V3 V4 V5 V6 V7 V8
0.021065 -0.015988 -0.008273 0.049849 0.014874 0.012352 -0.054584 0.004542
V9 V10 V11 V12 V13 V14 V15 V16
-0.017186 0.018006 -0.009707 -0.007382 0.044886 -0.051122 -0.026910 -0.048929
V17 V18 V19 V20 V21 V22 V23 V24
-0.008129 0.022129 -0.063525 0.026683 0.013424 -0.010145 -0.046046 0.024025
V25 V26 V27 V28 V29 V30 V31 V32
-0.003529 -0.038270 0.043657 0.049855 0.010691 0.041217 -0.012596 0.018302
V33 V34 V35 V36 V37 V38 V39 V40
0.040225 -0.012751 -0.062677 -0.002810 -0.002574 -0.024137 0.021324 -0.041520
V41 V42 V43 V44 V45 V46 V47 V48
-0.076482 0.009063 0.067097 -0.042554 -0.013789 0.002865 0.017325 -0.076860
V49 V50 V51 V52 V53 V54 V55 V56
-0.007003 -0.007315 0.030270 0.022066 -0.002224 -0.056534 0.013705 -0.003609
V57 V58 V59 V60 V61 V62 V63 V64
-0.044580 -0.037543 0.015745 0.035250 -0.017117 0.072470 0.004398 -0.015923
V65 V66 V67 V68 V69 V70 V71 V72
0.012864 -0.062752 -0.038437 -0.019586 0.019871 -0.068398 -0.111778 0.021416
V73 V74 V75 V76 V77 V78 V79 V80
0.036849 -0.009103 0.037790 0.021883 -0.034990 -0.014917 -0.003854 0.001760
V81 V82 V83 V84 V85 V86 V87 V88
-0.001812 0.003942 0.021810 -0.013984 -0.030446 0.049187 0.008392 0.026965
V89 V90 V91 V92 V93 V94 V95 V96
0.057301 0.004190 0.055505 -0.046006 -0.019080 -0.098889 -0.010891 -0.002729
V97 V98 V99 V100 V101
0.024939 -0.029847 0.063578 -0.061667 -0.022163
system.time( lm(V1 ~ ., dat=dat) ) # with the 101 column dataframe
user system elapsed 0.060 0.008 0.076 Sorry to give you such a Frost-y reception, but you are being somewhat ... what's the right word... sleepy? -- David.
The woods are lovely, dark and deep But I have promises to keep And miles before I go to sleep And miles before I go to sleep ----- On Fri, Apr 19, 2013 at 12:26 PM, peter dalgaard <pdalgd at gmail.com> wrote:
On Apr 18, 2013, at 21:24 , Nilesh Gupta wrote:
Hello all Is there a method/package in R in which I can do regressions for more
than
50 independent variables ?
What's wrong with lm() et al.? --
David Winsemius Alameda, CA, USA
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130419/5e4743c8/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130419/87c4157d/attachment.pl>
First, do you know what a multivariate multiple (linear) regression means? As opposed to (univariate) multiple (linear) regression. As others have pointed out, the example referred to is of univariate multiple linear regression. Second, and more importantly, have you yourself tried doing the needed regression with the data you have. If so, what are the results? Is your response even multivariate? (Sorry, entering the thread late.) Of course, you need to first know what you are trying to do. Any tool is only so good as the workman handling it. Ranjan On Fri, 19 Apr 2013 17:21:39 +0530 Nilesh Gupta
<gupta.nilesh84 at gmail.com> wrote:
I used this link http://r.789695.n4.nabble.com/model-frame-and-formula-mismatch-in-model-matrix-td4664093.html Regards The woods are lovely, dark and deep But I have promises to keep And miles before I go to sleep And miles before I go to sleep ----- On Fri, Apr 19, 2013 at 2:19 PM, David Winsemius <dwinsemius at comcast.net>wrote:
On Apr 19, 2013, at 12:40 AM, Nilesh Gupta wrote:
lm() does not accomodate more than 50 independent variables
What is your source for this misinformation?
dat <- as.data.frame(matrix(rnorm(51000), ncol=51) ) names(dat)
[1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11" "V12" "V13" "V14" "V15" "V16" "V17" "V18" [19] "V19" "V20" "V21" "V22" "V23" "V24" "V25" "V26" "V27" "V28" "V29" "V30" "V31" "V32" "V33" "V34" "V35" "V36" [37] "V37" "V38" "V39" "V40" "V41" "V42" "V43" "V44" "V45" "V46" "V47" "V48" "V49" "V50" "V51"
lm(V1 ~ ., dat=dat)
Call:
lm(formula = V1 ~ ., data = dat)
Coefficients:
(Intercept) V2 V3 V4 V5
V6 V7 V8
-0.0089517 -0.0427225 -0.0754946 -0.0002903 -0.0083482
0.0324383 -0.0194980 -0.0151008
V9 V10 V11 V12 V13
V14 V15 V16
0.0255324 -0.0167399 0.0476841 -0.0222229 0.0720990
-0.0174327 -0.0104261 0.0024625
V17 V18 V19 V20 V21
V22 V23 V24
-0.0086276 -0.0274867 -0.0345897 0.0209116 0.0368201
-0.0027364 0.0090916 0.0198854
V25 V26 V27 V28 V29
V30 V31 V32
-0.0083732 -0.0216937 0.0586361 -0.0530041 0.0402765
0.0073514 0.0295976 -0.0641553
V33 V34 V35 V36 V37
V38 V39 V40
0.0491071 -0.0261259 0.0364740 0.0070261 -0.0159851
-0.0373357 0.0506756 -0.0383495
V41 V42 V43 V44 V45
V46 V47 V48
0.0054945 0.0089468 -0.0050151 -0.0184369 0.0019926
-0.0177631 0.0282828 0.0353523
V49 V50 V51
-0.0382634 0.0545654 0.0101398
dat <- as.data.frame(matrix(rnorm(101000), ncol=101) ) lm(V1 ~ ., dat=dat)
Call:
lm(formula = V1 ~ ., data = dat)
Coefficients:
(Intercept) V2 V3 V4 V5
V6 V7 V8
0.021065 -0.015988 -0.008273 0.049849 0.014874
0.012352 -0.054584 0.004542
V9 V10 V11 V12 V13
V14 V15 V16
-0.017186 0.018006 -0.009707 -0.007382 0.044886
-0.051122 -0.026910 -0.048929
V17 V18 V19 V20 V21
V22 V23 V24
-0.008129 0.022129 -0.063525 0.026683 0.013424
-0.010145 -0.046046 0.024025
V25 V26 V27 V28 V29
V30 V31 V32
-0.003529 -0.038270 0.043657 0.049855 0.010691
0.041217 -0.012596 0.018302
V33 V34 V35 V36 V37
V38 V39 V40
0.040225 -0.012751 -0.062677 -0.002810 -0.002574
-0.024137 0.021324 -0.041520
V41 V42 V43 V44 V45
V46 V47 V48
-0.076482 0.009063 0.067097 -0.042554 -0.013789
0.002865 0.017325 -0.076860
V49 V50 V51 V52 V53
V54 V55 V56
-0.007003 -0.007315 0.030270 0.022066 -0.002224
-0.056534 0.013705 -0.003609
V57 V58 V59 V60 V61
V62 V63 V64
-0.044580 -0.037543 0.015745 0.035250 -0.017117
0.072470 0.004398 -0.015923
V65 V66 V67 V68 V69
V70 V71 V72
0.012864 -0.062752 -0.038437 -0.019586 0.019871
-0.068398 -0.111778 0.021416
V73 V74 V75 V76 V77
V78 V79 V80
0.036849 -0.009103 0.037790 0.021883 -0.034990
-0.014917 -0.003854 0.001760
V81 V82 V83 V84 V85
V86 V87 V88
-0.001812 0.003942 0.021810 -0.013984 -0.030446
0.049187 0.008392 0.026965
V89 V90 V91 V92 V93
V94 V95 V96
0.057301 0.004190 0.055505 -0.046006 -0.019080
-0.098889 -0.010891 -0.002729
V97 V98 V99 V100 V101
0.024939 -0.029847 0.063578 -0.061667 -0.022163
system.time( lm(V1 ~ ., dat=dat) ) # with the 101 column dataframe
user system elapsed 0.060 0.008 0.076 Sorry to give you such a Frost-y reception, but you are being somewhat ... what's the right word... sleepy? -- David.
The woods are lovely, dark and deep But I have promises to keep And miles before I go to sleep And miles before I go to sleep ----- On Fri, Apr 19, 2013 at 12:26 PM, peter dalgaard <pdalgd at gmail.com>
wrote:
On Apr 18, 2013, at 21:24 , Nilesh Gupta wrote:
Hello all Is there a method/package in R in which I can do regressions for more
than
50 independent variables ?
What's wrong with lm() et al.? --
David Winsemius Alameda, CA, USA
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Important Notice: This mailbox is ignored: e-mails are set to be deleted on receipt. For those needing to send personal or professional e-mail, please use appropriate addresses. ____________________________________________________________ GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most webmails
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130419/7a1f9de1/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130419/79ffad33/attachment.pl>
On Apr 19, 2013, at 19:15 , Nilesh Gupta wrote:
cbind( X1,X2,X3,X4,X5,X6,X7, ... ,X2393,X2394,X2395 )~ R_M_F+SMB+HML+WML I ran this code in the formula for i wanted to regress 2395 stocks for 500 months each on the three independent variables. I got this error . The idea was to run multivriate regressions on each of these stocks. Error in model.matrix.default(mt, mf, contrasts) : model frame and formula mismatch in model.matrix() Googling this error led me to that page and I now know that i mistakenly assumed that lm was limited to 50 variables. Is doing cbind(variables name) was the way to formulate multivariate regressions.? Where am i going wrong ?
lm() is unhappy about long expressions (this is arguably a bug), so avoid them: M <- cbind( X1,X2,X3,X4,X5,X6,X7, ... ,X2393,X2394,X2395 ) lm(M ~ R_M_F+SMB+HML+WML) Notice, though, that multivariate tests will be unhappy if you have more variables than degrees of freedom (M wider than tall, essentially). That's a theory issue, not an lm one.
The woods are lovely, dark and deep But I have promises to keep And miles before I go to sleep And miles before I go to sleep ----- On Fri, Apr 19, 2013 at 10:18 PM, David Winsemius <dwinsemius at comcast.net> wrote: On Apr 19, 2013, at 4:51 AM, Nilesh Gupta wrote:
I used this link http://r.789695.n4.nabble.com/model-frame-and-formula-mismatch-in-model-matrix-td4664093.html
But you said 50 independent variables, and that was probably someone's (failed) effort to submit 50 _dependent_ variables. What is the real problem? -- David.
Regards The woods are lovely, dark and deep But I have promises to keep And miles before I go to sleep And miles before I go to sleep ----- On Fri, Apr 19, 2013 at 2:19 PM, David Winsemius <dwinsemius at comcast.net> wrote: On Apr 19, 2013, at 12:40 AM, Nilesh Gupta wrote:
lm() does not accomodate more than 50 independent variables
What is your source for this misinformation?
dat <- as.data.frame(matrix(rnorm(51000), ncol=51) ) names(dat)
[1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11" "V12" "V13" "V14" "V15" "V16" "V17" "V18" [19] "V19" "V20" "V21" "V22" "V23" "V24" "V25" "V26" "V27" "V28" "V29" "V30" "V31" "V32" "V33" "V34" "V35" "V36" [37] "V37" "V38" "V39" "V40" "V41" "V42" "V43" "V44" "V45" "V46" "V47" "V48" "V49" "V50" "V51"
lm(V1 ~ ., dat=dat)
Call:
lm(formula = V1 ~ ., data = dat)
Coefficients:
(Intercept) V2 V3 V4 V5 V6 V7 V8
-0.0089517 -0.0427225 -0.0754946 -0.0002903 -0.0083482 0.0324383 -0.0194980 -0.0151008
V9 V10 V11 V12 V13 V14 V15 V16
0.0255324 -0.0167399 0.0476841 -0.0222229 0.0720990 -0.0174327 -0.0104261 0.0024625
V17 V18 V19 V20 V21 V22 V23 V24
-0.0086276 -0.0274867 -0.0345897 0.0209116 0.0368201 -0.0027364 0.0090916 0.0198854
V25 V26 V27 V28 V29 V30 V31 V32
-0.0083732 -0.0216937 0.0586361 -0.0530041 0.0402765 0.0073514 0.0295976 -0.0641553
V33 V34 V35 V36 V37 V38 V39 V40
0.0491071 -0.0261259 0.0364740 0.0070261 -0.0159851 -0.0373357 0.0506756 -0.0383495
V41 V42 V43 V44 V45 V46 V47 V48
0.0054945 0.0089468 -0.0050151 -0.0184369 0.0019926 -0.0177631 0.0282828 0.0353523
V49 V50 V51
-0.0382634 0.0545654 0.0101398
dat <- as.data.frame(matrix(rnorm(101000), ncol=101) ) lm(V1 ~ ., dat=dat)
Call:
lm(formula = V1 ~ ., data = dat)
Coefficients:
(Intercept) V2 V3 V4 V5 V6 V7 V8
0.021065 -0.015988 -0.008273 0.049849 0.014874 0.012352 -0.054584 0.004542
V9 V10 V11 V12 V13 V14 V15 V16
-0.017186 0.018006 -0.009707 -0.007382 0.044886 -0.051122 -0.026910 -0.048929
V17 V18 V19 V20 V21 V22 V23 V24
-0.008129 0.022129 -0.063525 0.026683 0.013424 -0.010145 -0.046046 0.024025
V25 V26 V27 V28 V29 V30 V31 V32
-0.003529 -0.038270 0.043657 0.049855 0.010691 0.041217 -0.012596 0.018302
V33 V34 V35 V36 V37 V38 V39 V40
0.040225 -0.012751 -0.062677 -0.002810 -0.002574 -0.024137 0.021324 -0.041520
V41 V42 V43 V44 V45 V46 V47 V48
-0.076482 0.009063 0.067097 -0.042554 -0.013789 0.002865 0.017325 -0.076860
V49 V50 V51 V52 V53 V54 V55 V56
-0.007003 -0.007315 0.030270 0.022066 -0.002224 -0.056534 0.013705 -0.003609
V57 V58 V59 V60 V61 V62 V63 V64
-0.044580 -0.037543 0.015745 0.035250 -0.017117 0.072470 0.004398 -0.015923
V65 V66 V67 V68 V69 V70 V71 V72
0.012864 -0.062752 -0.038437 -0.019586 0.019871 -0.068398 -0.111778 0.021416
V73 V74 V75 V76 V77 V78 V79 V80
0.036849 -0.009103 0.037790 0.021883 -0.034990 -0.014917 -0.003854 0.001760
V81 V82 V83 V84 V85 V86 V87 V88
-0.001812 0.003942 0.021810 -0.013984 -0.030446 0.049187 0.008392 0.026965
V89 V90 V91 V92 V93 V94 V95 V96
0.057301 0.004190 0.055505 -0.046006 -0.019080 -0.098889 -0.010891 -0.002729
V97 V98 V99 V100 V101
0.024939 -0.029847 0.063578 -0.061667 -0.022163
system.time( lm(V1 ~ ., dat=dat) ) # with the 101 column dataframe
user system elapsed 0.060 0.008 0.076 Sorry to give you such a Frost-y reception, but you are being somewhat ... what's the right word... sleepy? -- David.
The woods are lovely, dark and deep But I have promises to keep And miles before I go to sleep And miles before I go to sleep ----- On Fri, Apr 19, 2013 at 12:26 PM, peter dalgaard <pdalgd at gmail.com> wrote:
On Apr 18, 2013, at 21:24 , Nilesh Gupta wrote:
Hello all Is there a method/package in R in which I can do regressions for more
than
50 independent variables ?
What's wrong with lm() et al.? --
David Winsemius Alameda, CA, USA
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
To avoid the formula handling bug in lm/model.matrix/etc., you can try making the
formula shorter. E.g., if you know the names of your response columns,
responseCols <- c("X1", "X2", "X3", ..., "X2395")
try the formula
as.matrix(d[, responseCols]) ~ d[,"R_M_F"] + d[,"SMB"] + d[,"HML"] + d[,"WML"]
and do not use data=d in the call to lm().
You may also prefer to use lm.fit(), which takes the response matrix and design matrix
directly, so you avoid formulae altogether.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Nilesh Gupta Sent: Friday, April 19, 2013 10:16 AM To: David Winsemius Cc: r-help at r-project.org; peter dalgaard Subject: Re: [R] Multiple Multivariate regression in R with 50 independent variables cbind( X1,X2,X3,X4,X5,X6,X7, ... ,X2393,X2394,X2395 )~ R_M_F+SMB+HML+WML I ran this code in the formula for i wanted to regress 2395 stocks for 500 months each on the three independent variables. I got this error . The idea was to run multivriate regressions on each of these stocks. Error in model.matrix.default(mt, mf, contrasts) : model frame and formula mismatch in model.matrix() Googling this error led me to that page and I now know that i mistakenly assumed that lm was limited to 50 variables. Is doing cbind(variables name) was the way to formulate multivariate regressions.? Where am i going wrong ? The woods are lovely, dark and deep But I have promises to keep And miles before I go to sleep And miles before I go to sleep ----- On Fri, Apr 19, 2013 at 10:18 PM, David Winsemius <dwinsemius at comcast.net>wrote:
On Apr 19, 2013, at 4:51 AM, Nilesh Gupta wrote:
I used this link
td4664093.html
But you said 50 independent variables, and that was probably someone's (failed) effort to submit 50 _dependent_ variables. What is the real problem? -- David.
Regards The woods are lovely, dark and deep But I have promises to keep And miles before I go to sleep And miles before I go to sleep ----- On Fri, Apr 19, 2013 at 2:19 PM, David Winsemius <dwinsemius at comcast.net>
wrote:
On Apr 19, 2013, at 12:40 AM, Nilesh Gupta wrote:
lm() does not accomodate more than 50 independent variables
What is your source for this misinformation?
dat <- as.data.frame(matrix(rnorm(51000), ncol=51) ) names(dat)
[1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11"
"V12" "V13" "V14" "V15" "V16" "V17" "V18"
[19] "V19" "V20" "V21" "V22" "V23" "V24" "V25" "V26" "V27" "V28" "V29"
"V30" "V31" "V32" "V33" "V34" "V35" "V36"
[37] "V37" "V38" "V39" "V40" "V41" "V42" "V43" "V44" "V45" "V46" "V47"
"V48" "V49" "V50" "V51"
lm(V1 ~ ., dat=dat)
Call: lm(formula = V1 ~ ., data = dat) Coefficients: (Intercept) V2 V3 V4 V5
V6 V7 V8
-0.0089517 -0.0427225 -0.0754946 -0.0002903 -0.0083482
0.0324383 -0.0194980 -0.0151008
V9 V10 V11 V12 V13
V14 V15 V16
0.0255324 -0.0167399 0.0476841 -0.0222229 0.0720990
-0.0174327 -0.0104261 0.0024625
V17 V18 V19 V20 V21
V22 V23 V24
-0.0086276 -0.0274867 -0.0345897 0.0209116 0.0368201
-0.0027364 0.0090916 0.0198854
V25 V26 V27 V28 V29
V30 V31 V32
-0.0083732 -0.0216937 0.0586361 -0.0530041 0.0402765
0.0073514 0.0295976 -0.0641553
V33 V34 V35 V36 V37
V38 V39 V40
0.0491071 -0.0261259 0.0364740 0.0070261 -0.0159851
-0.0373357 0.0506756 -0.0383495
V41 V42 V43 V44 V45
V46 V47 V48
0.0054945 0.0089468 -0.0050151 -0.0184369 0.0019926
-0.0177631 0.0282828 0.0353523
V49 V50 V51 -0.0382634 0.0545654 0.0101398
dat <- as.data.frame(matrix(rnorm(101000), ncol=101) ) lm(V1 ~ ., dat=dat)
Call: lm(formula = V1 ~ ., data = dat) Coefficients: (Intercept) V2 V3 V4 V5
V6 V7 V8
0.021065 -0.015988 -0.008273 0.049849 0.014874
0.012352 -0.054584 0.004542
V9 V10 V11 V12 V13
V14 V15 V16
-0.017186 0.018006 -0.009707 -0.007382 0.044886
-0.051122 -0.026910 -0.048929
V17 V18 V19 V20 V21
V22 V23 V24
-0.008129 0.022129 -0.063525 0.026683 0.013424
-0.010145 -0.046046 0.024025
V25 V26 V27 V28 V29
V30 V31 V32
-0.003529 -0.038270 0.043657 0.049855 0.010691
0.041217 -0.012596 0.018302
V33 V34 V35 V36 V37
V38 V39 V40
0.040225 -0.012751 -0.062677 -0.002810 -0.002574
-0.024137 0.021324 -0.041520
V41 V42 V43 V44 V45
V46 V47 V48
-0.076482 0.009063 0.067097 -0.042554 -0.013789
0.002865 0.017325 -0.076860
V49 V50 V51 V52 V53
V54 V55 V56
-0.007003 -0.007315 0.030270 0.022066 -0.002224
-0.056534 0.013705 -0.003609
V57 V58 V59 V60 V61
V62 V63 V64
-0.044580 -0.037543 0.015745 0.035250 -0.017117
0.072470 0.004398 -0.015923
V65 V66 V67 V68 V69
V70 V71 V72
0.012864 -0.062752 -0.038437 -0.019586 0.019871
-0.068398 -0.111778 0.021416
V73 V74 V75 V76 V77
V78 V79 V80
0.036849 -0.009103 0.037790 0.021883 -0.034990
-0.014917 -0.003854 0.001760
V81 V82 V83 V84 V85
V86 V87 V88
-0.001812 0.003942 0.021810 -0.013984 -0.030446
0.049187 0.008392 0.026965
V89 V90 V91 V92 V93
V94 V95 V96
0.057301 0.004190 0.055505 -0.046006 -0.019080
-0.098889 -0.010891 -0.002729
V97 V98 V99 V100 V101 0.024939 -0.029847 0.063578 -0.061667 -0.022163
system.time( lm(V1 ~ ., dat=dat) ) # with the 101 column dataframe
user system elapsed 0.060 0.008 0.076 Sorry to give you such a Frost-y reception, but you are being somewhat
... what's the right word... sleepy?
-- David.
The woods are lovely, dark and deep But I have promises to keep And miles before I go to sleep And miles before I go to sleep ----- On Fri, Apr 19, 2013 at 12:26 PM, peter dalgaard <pdalgd at gmail.com>
wrote:
On Apr 18, 2013, at 21:24 , Nilesh Gupta wrote:
Hello all Is there a method/package in R in which I can do regressions for more
than
50 independent variables ?
What's wrong with lm() et al.? --
David Winsemius Alameda, CA, USA
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.