Skip to content

Apparant bug in binomial model in GLM (PR#13434)

3 messages · Peter Dalgaard, Charles Geyer

#
Good reference, but a much better reference tells what WORKS in this
situation not what doesn't work.

    http://www.stat.umn.edu/geyer/gdor/

has a paper and technical report that give methods detecting when the MLE
for a GLM does not exist in the conventional sense and for constructing
valid hypothesis tests and confidence intervals.  The methods are implemented
using the R contributed package rcdd and detailed examples are shown in
the technical report, which is made with Sweave, hence fully reproducible
by anyone who has R.

BTW the particular example given doesn't make clear WHAT question cannot
be answered correctly.  Some questions can be without fuss, for example
Warning messages:
1: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart,  :
  algorithm did not converge
2: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart,  :
  fitted probabilities numerically 0 or 1 occurred
Analysis of Deviance Table

Model 1: y ~ 1
Model 2: y ~ x
  Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1         9    13.8629
2         8  7.865e-10  1  13.8629    0.0002

This P-value (P = 0.0002) is valid, because the MLE does exist for the null
hypothesis.  Hence we see that we have to use the model y ~ x for which
the MLE does not exist in the conventional sense.

To obtain valid one-sided confidence intervals for the linear predictor or
mean value parameters, follow the methods in the technical report.

  
    
#
Charles Geyer wrote:
...
It may be  valid in some senses, but I can't help notice that it is off
by a factor of at least 10, since the experiment has only 1024 outcomes,
two of which are as extreme as the one observed, and where all outcomes
are equally likely under the corresponding y~1 model.

-p
#
On Wed, Jan 07, 2009 at 04:48:03PM +0100, Peter Dalgaard wrote:
It's valid in the sense that all of the P-values R produces for GLM are valid.
It's an asymptotic approximation.  As with all asymptotic approximations,
at best only the absolute error is small, not the relative error.  This
is no worse than any other P-value produced by anova.glm.  And the conclusion
that P < 0.05 or P < 0.01 is correct.

You already know all this, so why the e-mail?

Worst case, a P-value produced by anova.glm can be very questionable when
"n" is too small.  With n = 10 here, it's amazing that it does as well as it
does.  That's because the MLE in the null hypothesis says all of the
response variables have the symmetric binomial distribution, and the CLT does
work, more or less, down to n = 10 for the symmetric binomial distribution.

If you don't trust the asymptotics, then do a parametric bootstrap.  That's
trivial in R.

My point wasn't about the validity of asymptotics.  My point was that either
the asymptotic test done by anova.glm or the parametric bootstrap makes sense
only when the MLE exists for the null hypothesis.  Otherwise one has to follow
the procedures in my tech report.