A problem in a glm model
You need to look up the Hauck-Donner phenomenon in MASS (4th, 3rd or 2nd edition). In short, Wald tests of binomial or Poisson glms are highly unreliable: a moderate p-value indicates no effect or a very large effect. I suspect your model is in fact partially separable (that is can fit parts of the data exactly), since those are large coefficients for indicator variables. Try reducing the tolerance in glm.control (add epsilon=1e-10) and see if the coefficients change a lot.
On Thu, 8 May 2003, Simona Avanzo wrote:
Hallo all,
I have the following glm model:
f1 <- as.formula(paste("factor(y.fondi)~",
"flgsess + segmeta2 + udm + zona.geo + ultimo.prod.",
"+flg.a2 + flg.d.na2 + flg.v2 + flg.cc2",
" +(flg.a1 + flg.d.na1 + flg.v1 + flg.cc1)^2",
" + flg.a2:flg.d.na2 + flg.a2:flg.v2 + flg.a2:flg.cc2",
" + flg.d.na2:flg.v2 + flg.v2:flg.cc2",
sep=""))
g1 <- glm(f1,family=binomial,data=camp.lavoro.meno.na)
The variables are all factors:
? y.fondi takes value 0 or 1;
? flgsess has 2 levels;
? segmeta2 has 4 levels;
? udm has 6 levels;
? zona.geo has 5 levels;
? ultimo.prod. has 4 levels;
? flg.a1, flg.d.na1, flg.v1, flg.cc1, flg.a2, flg.d.na2, flg.v2, flg.cc2 are 8 factors that take values 0 or 1.
The number of observations is 1390.
The observations with "y.fondi = 1" are 259.
The observations with "y.fondi = 0" are 1131.
The summary of the model is:
summary(g1)
Call:
glm(formula = f1, family = binomial, data = camp.lavoro.meno.na)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.8955 -0.3586 -0.2692 -0.1642 2.9133
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.7647 0.7523 -3.675 0.000238 ***
... ... ... ... ...
flg.a21 0.7898 0.4948 1.596 0.110475
flg.d.na21 0.2097 0.7336 0.286 0.774963
flg.v21 0.3928 0.5257 0.747 0.454994
flg.cc21 -0.8547 1.4954 -0.572 0.567625
flg.a11 0.7051 0.4889 1.442 0.149221
flg.d.na11 1.3582 0.5429 2.502 0.012353 *
flg.v11 2.2596 0.5079 4.449 8.62e-06 ***
flg.cc11 -3.3658 8.5259 -0.395 0.693014
flg.a21:flg.d.na21 -6.9392 26.5432 -0.261 0.793760
flg.a21:flg.v21 -1.4355 4.0963 -0.350 0.726005
flg.a21:flg.cc21 -6.0460 72.4807 -0.083 0.933521
flg.d.na21:flg.v21 -2.4347 2.9045 -0.838 0.401888
flg.v21:flg.cc21 11.7232 72.4814 0.162 0.871510
flg.a11:flg.d.na11 -8.3843 30.4660 -0.275 0.783162 !!!!
flg.a11:flg.v11 6.5067 39.2569 0.166 0.868356
flg.a11:flg.cc11 13.5596 19.4693 0.696 0.486140 !!!!
flg.d.na11:flg.v11 -0.7143 1.2673 -0.564 0.573013
flg.d.na11:flg.cc11 12.0653 15.3880 0.784 0.432997
flg.v11:flg.cc11 6.2648 8.5808 0.730 0. 465331 !!!!
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1336.79 on 1389 degrees of freedom
Residual deviance: 576.08 on 1354 degrees of freedom
AIC: 648.08
Number of Fisher Scoring iterations: 8
If I apply the test anova, I obtain:
g1.1 <- update(g1,~.-flg.a1:flg.d.na1,data=camp.lavoro.meno.na) anova(g1.1,g1,test="Chisq")
Analysis of Deviance Table Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 1355 578.49 2 1354 576.08 1 2.41 0.12
g1.1 <- update(g1,~.-flg.a1:flg.cc1,data=camp.lavoro.meno.na) anova(g1.1,g1,test="Chisq")
Analysis of Deviance Table Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 1355 580.77 2 1354 576.08 1 4.69 0.03
g1.1 <- update(g1,~.-flg.v1:flg.cc1,data=camp.lavoro.meno.na) anova(g1.1,g1,test="Chisq")
Analysis of Deviance Table Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 1355 578.01 2 1354 576.08 1 1.94 0.16 Why I obtain these differences? Many thanks for any help, Simona
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595