Skip to content

Step command failing for lm function

5 messages · Noah Silverman, Joshua Wiley, Uwe Ligges

#
Hi,

I have a fairly simple linear regression using the lm function.  There
are about 100 variables and 30,000 rows of data.  It runs fine and
produces a decent looking R2 value.  I'm interested in performing a
stepwise variable selection to see if things can be cleaned up a bit.

Calling the step function returns ONE iteration (all the variables) and
then stops.  No errors are reported.   

Can someone suggest why this might not be working as expected. 
(Normally this function steps through all the variables to find the
"best" combination.)

Thanks!

-N
#
Hi Noah,

Are you able to reproduce the example on a smaller dataset?  Do you have any strange variable names or   I created a 30000 x 100 matrix, fit a linear model and step has been running fine (other than bringing my poor netbook to it's knees).  It also might be helpful if you could post your session info per the posting guide. 

You could also try: debug(step). Then run step on your model so you can see what the function does before it exits.

Cheers,

Josh
On Jan 9, 2011, at 23:57, Noah Silverman <noah at smartmediacorp.com> wrote:

            
#
On 10.01.2011 10:13, Joshua Wiley wrote:
Can you show us both your code and the output as well as the summary of 
the whole model, please?

Uwe Ligges
#
Hi,

Its a lot of data, but here are sum summary stats:

l <- lm(trainy ~ x)
num [1:31205, 1:48] 0.0975 -0.1987 0.3254 -0.7912 0.0975 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:31205] "5" "6" "7" "8" ...
  ..$ : NULL
 - attr(*, "names")= chr [1:1497840] "a" NA NA NA ...

summary(x)
       V1                  V2                  V3                 
V4                   V5                 V6         
 Min.   :-1.679848   Min.   :-1.606698   Min.   :-1.617491   Min.  
:-1.6534404   Min.   :-0.93052   Min.   :-1.66594 
 1st Qu.:-0.865216   1st Qu.:-0.867430   1st Qu.:-0.875567   1st
Qu.:-0.9042894   1st Qu.:-0.67904   1st Qu.:-0.90768 
 Median : 0.074739   Median :-0.004886   Median :-0.009924   Median :
0.0946436   Median :-0.40504   Median :-0.14942 
 Mean   : 0.000492   Mean   :-0.001140   Mean   :-0.001563   Mean  
:-0.0006543   Mean   :-0.01372   Mean   : 0.01700 
 3rd Qu.: 0.826709   3rd Qu.: 0.857625   3rd Qu.: 0.855687   3rd Qu.:
0.8438270   3rd Qu.: 0.23305   3rd Qu.: 0.79841 
 Max.   : 1.578680   Max.   : 1.596925   Max.   : 1.597644   Max.   :
1.5930105   Max.   : 2.74787   Max.   : 2.88363 
       V7                 V8                   V9               
V10                 V11                 V12          
 Min.   :-2.84607   Min.   :-17.340329   Min.   :-5.72374   Min.  
:-9.088574   Min.   :-0.753625   Min.   :-9.694224 
 1st Qu.:-0.69230   1st Qu.: -0.680686   1st Qu.:-0.77093   1st
Qu.:-0.484832   1st Qu.:-0.753625   1st Qu.:-0.535022 
 Median : 0.07690   Median : -0.050236   Median : 0.08103   Median :
0.127993   Median :-0.187126   Median : 0.094031 
 Mean   :-0.01912   Mean   :  0.007672   Mean   :-0.01086   Mean   :
0.004137   Mean   : 0.001845   Mean   : 0.005425 
 3rd Qu.: 0.69226   3rd Qu.:  0.643260   3rd Qu.: 0.70906   3rd Qu.:
0.646475   3rd Qu.: 0.232864   3rd Qu.: 0.640222 
 Max.   : 1.76915   Max.   :  4.299870   Max.   : 3.87579   Max.   :
4.307299   Max.   : 8.125662   Max.   :13.955377 
      V13                 V14                 V15                
V16                V17                 V18         
 Min.   :-2.325326   Min.   :-1.122704   Min.   :-15.78010   Min.  
:-1.41451   Min.   :-2.890895   Min.   :-6.48201 
 1st Qu.:-0.707599   1st Qu.:-0.677653   1st Qu.:  0.10818   1st
Qu.:-0.67008   1st Qu.:-0.562810   1st Qu.:-0.65572 
 Median : 0.022490   Median :-0.249277   Median :  0.29841   Median
:-0.24738   Median :-0.068975   Median :-0.01222 
 Mean   : 0.000984   Mean   : 0.005968   Mean   : -0.01914   Mean  
:-0.01929   Mean   :-0.004446   Mean   :-0.04004 
 3rd Qu.: 0.735969   3rd Qu.: 0.387072   3rd Qu.:  0.38232   3rd Qu.:
0.32839   3rd Qu.: 0.502638   3rd Qu.: 0.59069 
 Max.   : 2.328877   Max.   :10.034416   Max.   :  1.17948   Max.   :
3.66491   Max.   : 3.405497   Max.   : 3.95314 
      V19                  V20                 V21                
V22                  V23                 V24        
 Min.   :-3.4866219   Min.   :-53.84720   Min.   :-3.872473   Min.  
:-82.470612   Min.   :-0.877362   Min.   :-0.9064 
 1st Qu.:-0.6866883   1st Qu.: -0.57941   1st Qu.:-0.459875   1st Qu.:
-0.546812   1st Qu.:-0.556758   1st Qu.:-0.6743 
 Median : 0.0181297   Median : -0.01640   Median :-0.026090   Median :
-0.023271   Median :-0.283361   Median :-0.2101 
 Mean   : 0.0005746   Mean   :  0.02152   Mean   : 0.001832   Mean   :
-0.002836   Mean   : 0.006677   Mean   : 0.0330 
 3rd Qu.: 0.7036093   3rd Qu.:  0.58834   3rd Qu.: 0.400639   3rd Qu.: 
0.501094   3rd Qu.: 0.196238   3rd Qu.: 0.4863 
 Max.   : 3.5553623   Max.   : 53.96102   Max.   : 5.111946   Max.   : 
7.022679   Max.   :21.385854   Max.   :12.3242 
      V25                V26                V27               
V28                V29                V30           
 Min.   :-0.88375   Min.   :-1.11709   Min.   :-1.00780   Min.  
:-10.7395   Min.   :-1.66934   Min.   :-1.0292617 
 1st Qu.:-0.65752   1st Qu.:-0.71563   1st Qu.:-0.70467   1st Qu.:
-0.1804   1st Qu.:-0.46190   1st Qu.:-0.6029130 
 Median :-0.20505   Median :-0.07946   Median :-0.14171   Median : 
0.2798   Median :-0.12636   Median :-0.3733405 
 Mean   : 0.03226   Mean   : 0.02066   Mean   : 0.01787   Mean   :
-0.0344   Mean   : 0.01104   Mean   : 0.0004641 
 3rd Qu.: 0.47365   3rd Qu.: 0.48877   3rd Qu.: 0.42125   3rd Qu.: 
0.5117   3rd Qu.: 0.32533   3rd Qu.: 0.0530082 
 Max.   :10.88045   Max.   :11.39008   Max.   :11.55056   Max.   : 
1.2400   Max.   :76.74103   Max.   : 5.4643580 
      V31                V32                V33               
V34                V35                V36         
 Min.   :-1.72330   Min.   :-2.81647   Min.   :-1.22587   Min.  
:-1.33872   Min.   :-0.85680   Min.   :-1.84229 
 1st Qu.:-0.95858   1st Qu.:-0.68389   1st Qu.:-0.79860   1st
Qu.:-0.85541   1st Qu.:-0.66622   1st Qu.:-0.81453 
 Median :-0.19386   Median : 0.07774   Median :-0.18821   Median
:-0.18663   Median :-0.37654   Median :-0.25103 
 Mean   : 0.01799   Mean   :-0.01678   Mean   : 0.01022   Mean  
:-0.07883   Mean   :-0.05283   Mean   :-0.01440 
 3rd Qu.: 0.76204   3rd Qu.: 0.68705   3rd Qu.: 0.54426   3rd Qu.:
0.53015   3rd Qu.: 0.25618   3rd Qu.: 0.62855 
 Max.   : 2.86501   Max.   : 1.75334   Max.   : 4.57282   Max.   :
2.78523   Max.   : 3.86957   Max.   : 5.99709 
      V37                 V38               V39              
V40                 V41                V42          
 Min.   :-0.457517   Min.   :-2.2722   Min.   :-1.6455   Min.  
:-3.477135   Min.   :-1.17361   Min.   :-5.151515 
 1st Qu.:-0.457517   1st Qu.:-0.8465   1st Qu.:-0.8011   1st
Qu.:-0.687784   1st Qu.:-1.17361   1st Qu.:-0.057516 
 Median :-0.457517   Median :-0.2618   Median :-0.3438   Median
:-0.229916   Median : 0.03988   Median :-0.057516 
 Mean   :-0.001647   Mean   :-0.2080   Mean   :-0.1453   Mean   :
0.007545   Mean   : 0.02236   Mean   : 0.001137 
 3rd Qu.:-0.457517   3rd Qu.: 0.3710   3rd Qu.: 0.3013   3rd Qu.:
0.515931   3rd Qu.: 0.49494   3rd Qu.: 0.706584 
 Max.   :15.512632   Max.   : 2.1959   Max.   : 2.7406   Max.   :
3.717934   Max.   : 6.15788   Max.   : 5.036483 
      V43              V44                 V45                 
V46                 V47                  V48          
 Min.   :0.0000   Min.   :-0.708214   Min.   :-0.5407803   Min.  
:-0.980665   Min.   :-17.332960   Min.   :-0.291151 
 1st Qu.:0.0000   1st Qu.:-0.708214   1st Qu.:-0.5407803   1st
Qu.:-0.980665   1st Qu.: -0.684639   1st Qu.:-0.291151 
 Median :0.0000   Median :-0.286641   Median :-0.2274321   Median
:-0.416754   Median : -0.054618   Median :-0.291151 
 Mean   :0.1500   Mean   :-0.001202   Mean   :-0.0006913   Mean   :
0.004792   Mean   :  0.007181   Mean   : 0.008824 
 3rd Qu.:0.0000   3rd Qu.: 0.313288   3rd Qu.: 0.1232400   3rd Qu.:
0.711067   3rd Qu.:  0.654157   3rd Qu.:-0.291151 
 Max.   :1.0000   Max.   :30.801619   Max.   :45.7742768   Max.   :
5.786264   Max.   :  4.292532   Max.   :10.602908
#
I think I just figured it out.

x is a matrix.
l <- lm(y ~ x) works for generating a model, but fails.  (It considers x
as a single item to add/remove for step.)

Step does work if I use a data.frame

foo <- cbind(y,x)

l <- lm(y ~ ., data=foo)

Now step(l) works.

I guess R doesn't look at the "x" in the first version to iterate
through the different variable.  It does, however iterate when the "."
is used in a formula.
On 1/10/11 1:13 AM, Joshua Wiley wrote: