logistic regression

Sorry.

Let me try again then.

I am trying to find "significant" predictors" from a list of about 44
independent variables.  So I started with all 44 variables and ran
drop1(sep22lr, test="Chisq")... and then dropped the highest p value from
the run.  Then I reran the drop1.  

Model:
MIN_Mstocked ~ ORG_CODE + BECLBL08 + PEM_SScat + SOIL_MST_1 + 
    SOIL_NUTR + cE + cN + cELEV + cDIAM_125 + cCRCLS + cCULM_125 + 
    cSPH + cAGE + cVRI_NONPINE + cVRI_nonpineCFR + cVRI_BLEAF + 
    cvol_125 + cstrDST_SW + cwaterDST_SW + cSEEDSRCE_SW + cMAT + 
    cMWMT + cMCMT + cTD + cMAP + cMSP + cAHM + cSHM + cMATMAP + 
    cddless0 + cddless18 + cddgrtr0 + cddgrtr18 + cNFFD + cbFFP + 
    ceFFP + cPAS + cDD5_100 + cEXT_Cold + cS_INDX
                Df Deviance    AIC    LRT   Pr(Chi)    
<none>               814.21 938.21                     
ORG_CODE         4   824.97 940.97  10.76 0.0294100 *  
BECLBL08         9   845.61 951.61  31.41 0.0002519 ***
PEM_SScat       10   829.11 933.11  14.90 0.1357580    
SOIL_MST_1       1   814.63 936.63   0.43 0.5135094    
SOIL_NUTR        2   818.49 938.49   4.28 0.1175411    
cE               1   814.37 936.37   0.16 0.6886085    
cN               1   814.40 936.40   0.20 0.6566765    
cELEV            1   814.35 936.35   0.14 0.7044864    
cDIAM_125        1   817.98 939.98   3.78 0.0519554 .  
cCRCLS           1   819.32 941.32   5.11 0.0237598 *  
cCULM_125        1   816.17 938.17   1.97 0.1606846    
cSPH             1   816.62 938.62   2.41 0.1204141    
cAGE             1   815.92 937.92   1.72 0.1902314    
cVRI_NONPINE     1   818.04 940.04   3.84 0.0501149 .  
cVRI_nonpineCFR  1   821.17 943.17   6.96 0.0083197 ** 
cVRI_BLEAF       1   818.78 940.78   4.58 0.0324286 *  
cvol_125         1   814.67 936.67   0.47 0.4949495    
cstrDST_SW       1   814.63 936.63   0.42 0.5169757    
cwaterDST_SW     1   814.75 936.75   0.55 0.4592643    
cSEEDSRCE_SW     1   817.73 939.73   3.53 0.0604234 .  
cMAT             1   814.27 936.27   0.06 0.8002333    
cMWMT            1   814.49 936.49   0.28 0.5942246    
cMCMT            1   819.39 941.39   5.18 0.0228425 *  
cTD              1   816.20 938.20   1.99 0.1580332    
cMAP             1   814.25 936.25   0.04 0.8386626    
cMSP             1   818.41 940.41   4.20 0.0404411 *  
cAHM             1   815.66 937.66   1.46 0.2276311    
cSHM             1   819.95 941.95   5.75 0.0165227 *  
cMATMAP          1   814.91 936.91   0.71 0.4001878    
cddless0         1   818.04 940.04   3.83 0.0502153 .  
cddless18        1   817.81 939.81   3.60 0.0576931 .  
cddgrtr0         1   816.64 938.64   2.44 0.1184235    
cddgrtr18        1   815.77 937.77   1.57 0.2104958    
cNFFD            1   815.38 937.38   1.18 0.2782582    
cbFFP            1   814.39 936.39   0.18 0.6677481    
ceFFP            1   820.22 942.22   6.01 0.0141863 *  
cPAS             1   814.21 936.21   0.01 0.9347654    
cDD5_100         1   814.79 936.79   0.58 0.4447531    
cEXT_Cold        1   816.99 938.99   2.78 0.0954512 .  
cS_INDX          1   815.21 937.21   1.01 0.3157208    

And then systematically reran the drop1, removing the HIGHEST p value (least
significant)from each resultant until only significant (0.10) variables
remained.

Model:
MIN_Mstocked ~ ORG_CODE + BECLBL08 + PEM_SScat + SOIL_NUTR + 
    cSEEDSRCE_SW + cMSP + ceFFP + cEXT_Cold
             Df Deviance    AIC    LRT   Pr(Chi)    
<none>            884.20 946.20                     
ORG_CODE      4   916.38 970.38  32.18 1.757e-06 ***
BECLBL08      9   940.66 984.66  56.46 6.418e-09 ***
PEM_SScat    11   906.20 946.20  22.00 0.0243795 *  
SOIL_NUTR     2   894.19 952.19   9.99 0.0067557 ** 
cSEEDSRCE_SW  1   894.41 954.41  10.21 0.0013983 ** 
cMSP          1   896.97 956.97  12.77 0.0003516 ***
ceFFP         1   928.50 988.50  44.30 2.812e-11 ***
cEXT_Cold     1   923.35 983.35  39.15 3.921e-10 ***

I didn't create any kind of dummy or factor variables for my categorical
data (at least, not on purpose).

With a remaining 8 variables, I tried to run a logistic regression (glm)
against my dependent variable(MIN_Mstocked).  When I do a summary of the
glm, I am provided with the usual table of estimate, std error, z value, and
Pr(>|z|)... BUT there are some coefficients missing in the list.  None of
the categorical data is complete.  Some are missing only one category, while
others are missing 4 or 5 categories.  

e.g.

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)          -1.324e+02  1.363e+03  -0.097 0.922611    
ORG_CODE[T.DLA]      -1.504e+01  1.363e+03  -0.011 0.991192    
ORG_CODE[T.DMO]      -1.494e+01  1.363e+03  -0.011 0.991253    
ORG_CODE[T.DPG]      -1.766e+01  1.363e+03  -0.013 0.989658    
ORG_CODE[T.DVA]      -1.841e+01  1.363e+03  -0.014 0.989220    
BECLBL08[T.SBS dw 2] -6.733e-01  5.903e-01  -1.141 0.254033    
BECLBL08[T.SBS dw 3] -1.094e+00  5.714e-01  -1.914 0.055586 .  
BECLBL08[T.SBS mc 2]  1.573e-01  5.004e-01   0.314 0.753211    
BECLBL08[T.SBS mc 3]  1.402e+00  5.824e-01   2.408 0.016043 *  
BECLBL08[T.SBS mk 1] -2.388e+00  7.529e-01  -3.172 0.001514 ** 
BECLBL08[T.SBS mw]   -1.672e+01  1.393e+03  -0.012 0.990425    
BECLBL08[T.SBS vk]   -1.614e+01  1.243e+03  -0.013 0.989640    
BECLBL08[T.SBS wk 1] -3.640e+00  8.174e-01  -4.453 8.48e-06 ***
BECLBL08[T.SBS wk 3] -1.838e+01  1.363e+03  -0.013 0.989240    
PEM_SScat[T.B]       -1.815e+01  3.956e+03  -0.005 0.996339    
PEM_SScat[T.C]        1.998e-01  3.925e-01   0.509 0.610792    
PEM_SScat[T.D]       -2.314e-01  3.215e-01  -0.720 0.471621    
PEM_SScat[T.E]        5.581e-01  3.433e-01   1.626 0.104020    
PEM_SScat[T.F]       -1.113e+00  5.782e-01  -1.926 0.054153 .  
PEM_SScat[T.G]        1.780e-01  4.420e-01   0.403 0.687150    
PEM_SScat[T.H]        1.670e+01  3.956e+03   0.004 0.996633    
PEM_SScat[T.I]        2.751e-01  9.313e-01   0.295 0.767705    
PEM_SScat[T.J]       -2.623e-01  9.693e-01  -0.271 0.786649    
PEM_SScat[T.K]       -1.862e+01  3.956e+03  -0.005 0.996244    
PEM_SScat[T.L]       -1.661e+01  1.211e+03  -0.014 0.989056    
SOIL_NUTR[T.C]       -1.119e+00  3.781e-01  -2.960 0.003073 ** 
SOIL_NUTR[T.D]       -7.912e-02  9.049e-01  -0.087 0.930320    
cSEEDSRCE_SW         -1.512e-03  4.930e-04  -3.066 0.002170 ** 
cMSP                  1.808e-02  5.304e-03   3.409 0.000652 ***
ceFFP                 2.889e-01  4.662e-02   6.196 5.80e-10 ***
cEXT_Cold            -1.880e+00  3.330e-01  -5.647 1.63e-08 ***

There should be a PEM_Sscat[T.A].  It is the most prevalent occurrence in
this category.

ORG_CODE is missing more than 6 categories in the list

SOIL_NUTR should have a [T.B]

Does that help? 

-----Original Message-----
From: Kevin E. Thorpe [mailto:kevin.thorpe at utoronto.ca] 
Sent: Saturday, September 27, 2008 6:21 AM
To: Darin Brooks
Cc: r-help at r-project.org
Subject: Re: [R] logistic regression