Skip to content

pearson residuals in glm for binomial response (PR#1175)

3 messages · J.T.Kent@leeds.ac.uk, Peter Dalgaard, Thomas Lumley

#
R version 1.3.0
OS: SunOS 5.7, but I think the same problem occurs with Windows

An incorrect formula seems to be used to calculate the pearson residuals
for a generalized linear model with a binomial response.  Here is a
simple program which gives (a) the pearson residuals calculated directly,
(b) the pearson residuals from glm, and (c) the deviance residuals from 
glm.  The first and last columns are quite close.  The middle differs from 
the first by a factor of n (n=10 here).  This problem did not occur with
an earlier version of R I tried, version 0.64.2.

Thank you.

Professor John T. Kent         tel (direct)   (44) 113-233-5103
Department of Statistics       tel(secretary) (44) 113-233-5101
University of Leeds            fax            (44) 113-233-5090
Leeds LS2 9JT, England         e-mail: J.T.Kent@leeds.ac.uk

Input:

x_1:4 # regressor variable
y_c(2,6,7,8) # response binomial counts
n_rep(10,4) # number of binomial trials
ym_cbind(y,n-y) # response variable as a matrix
glm1_glm(ym~x,binomial) # fit a generalized linear model
f_fitted(glm1)
rp1_(y-n*f)/sqrt(n*f*(1-f)) # direct calculation of pearson residuals
rp2_residuals(glm1,type="pearson") # should be pearson residuals
rd_residuals(glm1) # deviance residuals
cbind(rp1,rp2,rd) # note second column is wrong, in this case by factor of 10

Output:

             rp1          rp2          rd
[1,] -0.56502992 -0.056500696 -0.58467936
[2,]  0.73739407  0.073739452  0.73886417
[3,]  0.05266796  0.005266533  0.05279174
[4,] -0.38314283 -0.038307944 -0.37010220
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
J.T.Kent@leeds.ac.uk writes:
I almost said that your version is old and that you forgot to check
the bug repository for prior reports (PR#1123 was also on Pearson
residuals). However, the problem persists in the development version.

The issue is that we have (in 1.4.0pre)

 pearson = (y - mu)/sqrt(wts * (object$family$variance)(mu))

However, the y and mu that we use are divided by n:
[1] 10 10 10 10
[1] 0.2802480 0.4834715 0.6923131 0.8439674
[1] 0.2017091 0.2497268 0.2130157 0.1316864
[1] 0.2 0.6 0.7 0.8

I've stuck my foot in it with this stuff before, but should the
denominator not be sqrt(object$family$variance(mu)/wts) ?? (If the
weights are large, the variance should be smaller, not larger).

Could someone please ram a wooden stake through this one and prevent
it from haunting us again?
1 day later
#
On Fri, 16 Nov 2001 J.T.Kent@leeds.ac.uk wrote:

            
Yes, it is a bug, resulting from an incorrect fix to an earlier bug that
had the sign wrong for decreasing link functions like the reciprocal.
It's already been reported and fixed.

	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._