A problem in a glm model - R-help

Thu, May 8, 2003 2:48 PM #

Hallo all, 

I have the following glm model:

f1 <- as.formula(paste("factor(y.fondi)~",
                  "flgsess + segmeta2 + udm + zona.geo + ultimo.prod.", 
                  "+flg.a2 + flg.d.na2 + flg.v2 + flg.cc2",
                  " +(flg.a1 + flg.d.na1 + flg.v1 + flg.cc1)^2",
                  " + flg.a2:flg.d.na2 + flg.a2:flg.v2 + flg.a2:flg.cc2",
                  " + flg.d.na2:flg.v2 + flg.v2:flg.cc2",
                 sep=""))

g1 <- glm(f1,family=binomial,data=camp.lavoro.meno.na)

The variables are all factors:
?	y.fondi takes value 0 or 1; 
?	flgsess has 2 levels;
?	segmeta2 has 4 levels;
?	udm has 6 levels;
?	zona.geo has 5 levels;
?	ultimo.prod. has 4 levels;
?	flg.a1, flg.d.na1, flg.v1, flg.cc1, flg.a2, flg.d.na2,  flg.v2, flg.cc2  are 8 factors that take values 0 or 1.

The number of observations is 1390. 
The observations with "y.fondi = 1" are 259.
The observations with "y.fondi = 0" are 1131.
 
The summary of the model is:

Call:
glm(formula = f1, family = binomial, data = camp.lavoro.meno.na)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8955  -0.3586  -0.2692  -0.1642   2.9133  

Coefficients:
                                   Estimate    Std. Error  z value   Pr(>|z|)    
(Intercept)                    -2.7647     0.7523     -3.675    0.000238 ***
...                                      ...           ...              ...              ...        

flg.a21                           0.7898      0.4948     1.596     0.110475    
flg.d.na21                      0.2097      0.7336     0.286     0.774963    
flg.v21                           0.3928      0.5257     0.747     0.454994    
flg.cc21                         -0.8547      1.4954    -0.572     0.567625    
flg.a11                           0.7051      0.4889     1.442     0.149221    
flg.d.na11                       1.3582     0.5429     2.502     0.012353 *  
flg.v11                            2.2596     0.5079     4.449     8.62e-06 ***
flg.cc11                          -3.3658     8.5259    -0.395     0.693014    
flg.a21:flg.d.na21           -6.9392     26.5432  -0.261     0.793760    
flg.a21:flg.v21                -1.4355     4.0963    -0.350    0.726005    
flg.a21:flg.cc21               -6.0460    72.4807    -0.083    0.933521    
flg.d.na21:flg.v21            -2.4347     2.9045    -0.838    0.401888    
flg.v21:flg.cc21                11.7232   72.4814     0.162    0.871510    
flg.a11:flg.d.na11            -8.3843    30.4660    -0.275   0.783162 !!!!    
flg.a11:flg.v11                  6.5067    39.2569     0.166   0.868356    
flg.a11:flg.cc11                 13.5596   19.4693    0.696   0.486140  !!!!  
flg.d.na11:flg.v11            -0.7143     1.2673     -0.564   0.573013    
flg.d.na11:flg.cc11            12.0653   15.3880     0.784   0.432997    
flg.v11:flg.cc11                  6.2648    8.5808      0.730  0. 465331  !!!!  

Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1336.79  on 1389  degrees of freedom
Residual deviance:  576.08  on 1354  degrees of freedom
AIC: 648.08

Number of Fisher Scoring iterations: 8

If  I apply the test anova, I obtain:

Analysis of Deviance Table
  Resid. Df Resid. Dev   Df Deviance P(>|Chi|)
1      1355     578.49                        
2      1354     576.08    1     2.41      0.12

Analysis of Deviance Table
  Resid. Df Resid. Dev   Df Deviance P(>|Chi|)
1      1355     580.77                        
2      1354     576.08    1     4.69      0.03

Analysis of Deviance Table
  Resid. Df Resid. Dev   Df Deviance P(>|Chi|)
1      1355     578.01                        
2      1354     576.08    1     1.94      0.16

Why I obtain these differences?
Many thanks for any help, 

Simona

Brian Ripley

Thu, May 8, 2003 3:05 PM #

You need to look up the Hauck-Donner phenomenon in MASS (4th, 3rd or 2nd 
edition).

In short, Wald tests of binomial or Poisson glms are highly unreliable:
a moderate p-value indicates no effect or a very large effect.

I suspect your model is in fact partially separable (that is can fit parts
of the data exactly), since those are large coefficients for indicator 
variables.  Try reducing the tolerance in glm.control (add epsilon=1e-10) 
and see if the coefficients change a lot.

On Thu, 8 May 2003, Simona Avanzo wrote:

Hallo all, 

I have the following glm model:

f1 <- as.formula(paste("factor(y.fondi)~",
                  "flgsess + segmeta2 + udm + zona.geo + ultimo.prod.", 
                  "+flg.a2 + flg.d.na2 + flg.v2 + flg.cc2",
                  " +(flg.a1 + flg.d.na1 + flg.v1 + flg.cc1)^2",
                  " + flg.a2:flg.d.na2 + flg.a2:flg.v2 + flg.a2:flg.cc2",
                  " + flg.d.na2:flg.v2 + flg.v2:flg.cc2",
                 sep=""))

g1 <- glm(f1,family=binomial,data=camp.lavoro.meno.na)

The variables are all factors:
?	y.fondi takes value 0 or 1; 
?	flgsess has 2 levels;
?	segmeta2 has 4 levels;
?	udm has 6 levels;
?	zona.geo has 5 levels;
?	ultimo.prod. has 4 levels;
?	flg.a1, flg.d.na1, flg.v1, flg.cc1, flg.a2, flg.d.na2,  flg.v2, flg.cc2  are 8 factors that take values 0 or 1.

The number of observations is 1390. 
The observations with "y.fondi = 1" are 259.
The observations with "y.fondi = 0" are 1131.
 
The summary of the model is:

summary(g1)

Call:
glm(formula = f1, family = binomial, data = camp.lavoro.meno.na)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8955  -0.3586  -0.2692  -0.1642   2.9133  

Coefficients:
                                   Estimate    Std. Error  z value   Pr(>|z|)    
(Intercept)                    -2.7647     0.7523     -3.675    0.000238 ***
...                                      ...           ...              ...              ...        

flg.a21                           0.7898      0.4948     1.596     0.110475    
flg.d.na21                      0.2097      0.7336     0.286     0.774963    
flg.v21                           0.3928      0.5257     0.747     0.454994    
flg.cc21                         -0.8547      1.4954    -0.572     0.567625    
flg.a11                           0.7051      0.4889     1.442     0.149221    
flg.d.na11                       1.3582     0.5429     2.502     0.012353 *  
flg.v11                            2.2596     0.5079     4.449     8.62e-06 ***
flg.cc11                          -3.3658     8.5259    -0.395     0.693014    
flg.a21:flg.d.na21           -6.9392     26.5432  -0.261     0.793760    
flg.a21:flg.v21                -1.4355     4.0963    -0.350    0.726005    
flg.a21:flg.cc21               -6.0460    72.4807    -0.083    0.933521    
flg.d.na21:flg.v21            -2.4347     2.9045    -0.838    0.401888    
flg.v21:flg.cc21                11.7232   72.4814     0.162    0.871510    
flg.a11:flg.d.na11            -8.3843    30.4660    -0.275   0.783162 !!!!    
flg.a11:flg.v11                  6.5067    39.2569     0.166   0.868356    
flg.a11:flg.cc11                 13.5596   19.4693    0.696   0.486140  !!!!  
flg.d.na11:flg.v11            -0.7143     1.2673     -0.564   0.573013    
flg.d.na11:flg.cc11            12.0653   15.3880     0.784   0.432997    
flg.v11:flg.cc11                  6.2648    8.5808      0.730  0. 465331  !!!!  

Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1336.79  on 1389  degrees of freedom
Residual deviance:  576.08  on 1354  degrees of freedom
AIC: 648.08

Number of Fisher Scoring iterations: 8

If  I apply the test anova, I obtain:

g1.1 <- update(g1,~.-flg.a1:flg.d.na1,data=camp.lavoro.meno.na)
anova(g1.1,g1,test="Chisq")

Analysis of Deviance Table
  Resid. Df Resid. Dev   Df Deviance P(>|Chi|)
1      1355     578.49                        
2      1354     576.08    1     2.41      0.12

g1.1 <- update(g1,~.-flg.a1:flg.cc1,data=camp.lavoro.meno.na)
anova(g1.1,g1,test="Chisq")

Analysis of Deviance Table
  Resid. Df Resid. Dev   Df Deviance P(>|Chi|)
1      1355     580.77                        
2      1354     576.08    1     4.69      0.03

g1.1 <- update(g1,~.-flg.v1:flg.cc1,data=camp.lavoro.meno.na)
anova(g1.1,g1,test="Chisq")

Analysis of Deviance Table
  Resid. Df Resid. Dev   Df Deviance P(>|Chi|)
1      1355     578.01                        
2      1354     576.08    1     1.94      0.16

Why I obtain these differences?
Many thanks for any help, 

Simona

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595