Skip to content

Difference between drop1() vs. anova() for Gaussian glm models

2 messages · Karl Ove Hufthammer, Peter Dalgaard

#
Dear list members,

I?m having some problems understanding why drop1() and anova() gives 
different results for *Gaussian* glm models. Here?s a simple example:

  d = data.frame(x=1:6,
                 group=factor(c(rep("A",2), rep("B", 4))))
  l = glm(x~group, data=d)

Running the following code gives *three* different p-values. (I would expect 
it to give two different p-values.)

  anova(l, test="F")     # p = 0.04179
  anova(l, test="Chisq") # p = 0.00313
  drop1(l, test="Chisq") # p = 0.00841

I?m used to anova() and drop1() giving identical results for the same ?test? 
argument. However, it looks like the first two tests above use the F-
statistic as a test statistic, while the last one uses a ?scaled deviance? 
statistic:

  1-pf(8.7273, 1, 4)  # F-statistic
  1-pchisq(8.7273, 1) # F-statistic
  1-pchisq(6.9447, 1) # Scaled deviance

I couldn?t find any documentation on this difference. The help page for 
drop1() does say:

  The F tests for the "glm" methods are based on analysis of
  deviance tests, so if the dispersion is estimated it is based
  on the residual deviance, unlike the F tests of anova.glm.

But here it?s talking about *F* tests. And drop1() with test="F" actually 
gives the *same* p-value as anova() with test="F":

  drop1(l, test="F") # p = 0.04179

Any ideas why anova() and drop(1) uses different test statistics for the 
same ?test? arguments? And why the help page implies (?) that the results 
should differ for F-tests (while not mentioning chi-squared test), but here 
they do not (and the chi-squared tests do)?

$ sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-suse-linux-gnu (64-bit)
Running under: openSUSE 20150714 (Tumbleweed) (x86_64)

locale:
 [1] LC_CTYPE=nn_NO.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=nn_NO.UTF-8        LC_COLLATE=nn_NO.UTF-8    
 [5] LC_MONETARY=nn_NO.UTF-8    LC_MESSAGES=nn_NO.UTF-8   
 [7] LC_PAPER=nn_NO.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=nn_NO.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

loaded via a namespace (and not attached):
[1] tools_3.2.1
#
I am somewhat surprised that _anything_ sensible comes out of anova.glm(l, test="Chisq"). I think it is mostly expected that you use F tests for that case.

What does seem to come out is the same as for drop1(l, test="Rao"), which gives the scaled score test, which would seem to be equivalent to scaled deviance in this case. 

drop1.glm(l, test="Chisq") appears to be calculating the "real" likelihood ratio test, evaluated in its asymptotic chi-square distribution:
'log Lik.' 6.944717 (df=3)

(Apologies for the daft output there... Why does "-" not either subtract the df or unclass the whole thing?)

Notice that the scaled tests basically assume that the scale is known, even if it is estimated, so in that sense, the real LRT should be superior. However, in that case it is well known that the asymptotic approximation can be improved  by transforming the LRT to the F statistic, whose exact distribution is known.

The remaining part of the riddle is why anova.glm doesn't do likelihood differences in the same fashion as drop1.glm. My best guess is that it tries to be consistent with anova.lm and anova.lm tries not to have to refit the sequence of submodels.