Skip to content

Multiple Multivariate regression in R with 50 independent variables

13 messages · Peter Dalgaard, David Winsemius, Nilesh Gupta +2 more

#
On Apr 18, 2013, at 21:24 , Nilesh Gupta wrote:

            
What's wrong with lm() et al.?
#
On Apr 19, 2013, at 12:40 AM, Nilesh Gupta wrote:

            
What is your source for this misinformation?
[1] "V1"  "V2"  "V3"  "V4"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10" "V11" "V12" "V13" "V14" "V15" "V16" "V17" "V18"
[19] "V19" "V20" "V21" "V22" "V23" "V24" "V25" "V26" "V27" "V28" "V29" "V30" "V31" "V32" "V33" "V34" "V35" "V36"
[37] "V37" "V38" "V39" "V40" "V41" "V42" "V43" "V44" "V45" "V46" "V47" "V48" "V49" "V50" "V51"
Call:
lm(formula = V1 ~ ., data = dat)

Coefficients:
(Intercept)           V2           V3           V4           V5           V6           V7           V8  
 -0.0089517   -0.0427225   -0.0754946   -0.0002903   -0.0083482    0.0324383   -0.0194980   -0.0151008  
         V9          V10          V11          V12          V13          V14          V15          V16  
  0.0255324   -0.0167399    0.0476841   -0.0222229    0.0720990   -0.0174327   -0.0104261    0.0024625  
        V17          V18          V19          V20          V21          V22          V23          V24  
 -0.0086276   -0.0274867   -0.0345897    0.0209116    0.0368201   -0.0027364    0.0090916    0.0198854  
        V25          V26          V27          V28          V29          V30          V31          V32  
 -0.0083732   -0.0216937    0.0586361   -0.0530041    0.0402765    0.0073514    0.0295976   -0.0641553  
        V33          V34          V35          V36          V37          V38          V39          V40  
  0.0491071   -0.0261259    0.0364740    0.0070261   -0.0159851   -0.0373357    0.0506756   -0.0383495  
        V41          V42          V43          V44          V45          V46          V47          V48  
  0.0054945    0.0089468   -0.0050151   -0.0184369    0.0019926   -0.0177631    0.0282828    0.0353523  
        V49          V50          V51  
 -0.0382634    0.0545654    0.0101398
Call:
lm(formula = V1 ~ ., data = dat)

Coefficients:
(Intercept)           V2           V3           V4           V5           V6           V7           V8  
   0.021065    -0.015988    -0.008273     0.049849     0.014874     0.012352    -0.054584     0.004542  
         V9          V10          V11          V12          V13          V14          V15          V16  
  -0.017186     0.018006    -0.009707    -0.007382     0.044886    -0.051122    -0.026910    -0.048929  
        V17          V18          V19          V20          V21          V22          V23          V24  
  -0.008129     0.022129    -0.063525     0.026683     0.013424    -0.010145    -0.046046     0.024025  
        V25          V26          V27          V28          V29          V30          V31          V32  
  -0.003529    -0.038270     0.043657     0.049855     0.010691     0.041217    -0.012596     0.018302  
        V33          V34          V35          V36          V37          V38          V39          V40  
   0.040225    -0.012751    -0.062677    -0.002810    -0.002574    -0.024137     0.021324    -0.041520  
        V41          V42          V43          V44          V45          V46          V47          V48  
  -0.076482     0.009063     0.067097    -0.042554    -0.013789     0.002865     0.017325    -0.076860  
        V49          V50          V51          V52          V53          V54          V55          V56  
  -0.007003    -0.007315     0.030270     0.022066    -0.002224    -0.056534     0.013705    -0.003609  
        V57          V58          V59          V60          V61          V62          V63          V64  
  -0.044580    -0.037543     0.015745     0.035250    -0.017117     0.072470     0.004398    -0.015923  
        V65          V66          V67          V68          V69          V70          V71          V72  
   0.012864    -0.062752    -0.038437    -0.019586     0.019871    -0.068398    -0.111778     0.021416  
        V73          V74          V75          V76          V77          V78          V79          V80  
   0.036849    -0.009103     0.037790     0.021883    -0.034990    -0.014917    -0.003854     0.001760  
        V81          V82          V83          V84          V85          V86          V87          V88  
  -0.001812     0.003942     0.021810    -0.013984    -0.030446     0.049187     0.008392     0.026965  
        V89          V90          V91          V92          V93          V94          V95          V96  
   0.057301     0.004190     0.055505    -0.046006    -0.019080    -0.098889    -0.010891    -0.002729  
        V97          V98          V99         V100         V101  
   0.024939    -0.029847     0.063578    -0.061667    -0.022163
user  system elapsed 
  0.060   0.008   0.076 

Sorry to give you such a Frost-y reception, but you are being somewhat ... what's the right word... sleepy?
#
But that's not how to specify a multivariate regression. It's a univariate regression with a huge sum on the left hand side.
On Apr 19, 2013, at 13:51 , Nilesh Gupta wrote:

            

  
    
#
First, do you know what a multivariate multiple (linear) regression
means? As opposed to (univariate) multiple (linear) regression.  As
others have pointed out, the example referred to is of univariate
multiple linear regression.

Second, and more importantly, have you yourself tried doing the needed
regression with the data you have. If so, what are the results?

Is your response even multivariate? (Sorry, entering the thread late.)

Of course, you need to first know what you are trying to do. Any tool is
only so good as the workman handling it.

Ranjan

On Fri, 19 Apr 2013 17:21:39 +0530 Nilesh Gupta
<gupta.nilesh84 at gmail.com> wrote:

            

  
    
#
On Apr 19, 2013, at 19:15 , Nilesh Gupta wrote:

            
lm() is unhappy about long expressions (this is arguably a bug), so avoid them:

M <- cbind( X1,X2,X3,X4,X5,X6,X7, ...  ,X2393,X2394,X2395 )
lm(M ~ R_M_F+SMB+HML+WML)

Notice, though, that multivariate tests will be unhappy if you have more variables than degrees of freedom (M wider than tall, essentially). 

That's a theory issue, not an lm one.

  
    
#
To avoid the formula handling bug in lm/model.matrix/etc., you can try making the
formula shorter.  E.g., if you know the names of your response columns,
   responseCols <- c("X1", "X2", "X3", ..., "X2395")
try the formula
   as.matrix(d[, responseCols]) ~ d[,"R_M_F"] + d[,"SMB"] + d[,"HML"] + d[,"WML"]
and do not use data=d in the call to lm().

You may also prefer to use lm.fit(), which takes the response matrix and design matrix
directly, so you avoid formulae altogether.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com