Extreme AIC or BIC values in glm(), logistic regression
Dear Thomas, Thank you very much for the answering! Yet why the situation happens only on some model, not all models? - that is, why for other model it can drop some variables but for this one it can't? Thanks!! Best regards, Maggie
On Wed, Mar 18, 2009 at 3:38 PM, Thomas Lumley <tlumley at u.washington.edu> wrote:
With 30 variables and only 55 residual degrees of freedom you probably have perfect separation due to not having enough data. ?Look at the coefficients -- they are infinite, implying perfect overfitting. ? ? ?-thomas On Wed, 18 Mar 2009, Maggie Wang wrote:
Dear R-users, I use glm() to do logistic regression and use stepAIC() to do stepwise model selection. The common AIC value comes out is about 100, a good fit is as low as around 70. But for some model, the AIC went to extreme values like 1000. When I check the P-values, All the independent variables (about 30 of them) included in the equation are very significant, which is impossible, because we expect some would be dropped. ?This situation is not uncommon. A summary output like this: Coefficients: ? ? ? ? ? ? ? ? ? ? ? ? ? ? Estimate Std. Error ? z value Pr(>|z|) (Intercept) ? ? ? ? ? ? ? ? ? 4.883e+14 ?1.671e+07 ?29217415 ? <2e-16 *** g761 ? ? ? ? ? ? ? ? ? ? ? ? -5.383e+14 ?9.897e+07 ?-5438529 ? <2e-16 *** g2809 ? ? ? ? ? ? ? ? ? ? ? ?-1.945e+15 ?1.082e+08 -17977871 ? <2e-16 *** g3106 ? ? ? ? ? ? ? ? ? ? ? ?-2.803e+15 ?9.351e+07 -29976674 ? <2e-16 *** g4373 ? ? ? ? ? ? ? ? ? ? ? ?-9.272e+14 ?6.534e+07 -14190077 ? <2e-16 *** g4583 ? ? ? ? ? ? ? ? ? ? ? ?-2.279e+15 ?1.223e+08 -18640563 ? <2e-16 *** g761:g2809 ? ? ? ? ? ? ? ? ? -5.101e+14 ?4.693e+08 ?-1086931 ? <2e-16 *** g761:g3106 ? ? ? ? ? ? ? ? ? -3.399e+16 ?6.923e+08 -49093218 ? <2e-16 *** g2809:g3106 ? ? ? ? ? ? ? ? ? 3.016e+15 ?6.860e+08 ? 4397188 ? <2e-16 *** g761:g4373 ? ? ? ? ? ? ? ? ? ?3.180e+15 ?4.595e+08 ? 6920270 ? <2e-16 *** g2809:g4373 ? ? ? ? ? ? ? ? ?-5.184e+15 ?4.436e+08 -11685382 ? <2e-16 *** g3106:g4373 ? ? ? ? ? ? ? ? ? 1.589e+16 ?2.572e+08 ?61788148 ? <2e-16 *** g761:g4583 ? ? ? ? ? ? ? ? ? -1.419e+16 ?8.199e+08 -17303033 ? <2e-16 *** g2809:g4583 ? ? ? ? ? ? ? ? ?-2.540e+16 ?8.151e+08 -31156781 ? <2e-16 *** ........ (omit) ........ f. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 (Dispersion parameter for binomial family taken to be 1) ?Null deviance: ?120.32 ?on 86 ?degrees of freedom Residual deviance: 1009.22 ?on 55 ?degrees of freedom AIC: 1073.2 Number of Fisher Scoring iterations: 25 Could anyone suggest what does this mean? ? How can I perform a reliable logistic regression? Thank you so much for the help! Best Regards, Maggie ? ? ? ?[[alternative HTML version deleted]]
Thomas Lumley ? ? ? ? ? ? ? ? ? Assoc. Professor, Biostatistics tlumley at u.washington.edu ? ? ? ?University of Washington, Seattle