R version 1.3.0
OS: SunOS 5.7, but I think the same problem occurs with Windows
An incorrect formula seems to be used to calculate the pearson residuals
for a generalized linear model with a binomial response. Here is a
simple program which gives (a) the pearson residuals calculated directly,
(b) the pearson residuals from glm, and (c) the deviance residuals from
glm. The first and last columns are quite close. The middle differs from
the first by a factor of n (n=10 here). This problem did not occur with
an earlier version of R I tried, version 0.64.2.
Thank you.
Professor John T. Kent tel (direct) (44) 113-233-5103
Department of Statistics tel(secretary) (44) 113-233-5101
University of Leeds fax (44) 113-233-5090
Leeds LS2 9JT, England e-mail: J.T.Kent@leeds.ac.uk
Input:
x_1:4 # regressor variable
y_c(2,6,7,8) # response binomial counts
n_rep(10,4) # number of binomial trials
ym_cbind(y,n-y) # response variable as a matrix
glm1_glm(ym~x,binomial) # fit a generalized linear model
f_fitted(glm1)
rp1_(y-n*f)/sqrt(n*f*(1-f)) # direct calculation of pearson residuals
rp2_residuals(glm1,type="pearson") # should be pearson residuals
rd_residuals(glm1) # deviance residuals
cbind(rp1,rp2,rd) # note second column is wrong, in this case by factor of 10
Output:
rp1 rp2 rd
[1,] -0.56502992 -0.056500696 -0.58467936
[2,] 0.73739407 0.073739452 0.73886417
[3,] 0.05266796 0.005266533 0.05279174
[4,] -0.38314283 -0.038307944 -0.37010220
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
R version 1.3.0
OS: SunOS 5.7, but I think the same problem occurs with Windows
An incorrect formula seems to be used to calculate the pearson residuals
for a generalized linear model with a binomial response. Here is a
simple program which gives (a) the pearson residuals calculated directly,
(b) the pearson residuals from glm, and (c) the deviance residuals from
glm. The first and last columns are quite close. The middle differs from
the first by a factor of n (n=10 here). This problem did not occur with
an earlier version of R I tried, version 0.64.2.
I almost said that your version is old and that you forgot to check
the bug repository for prior reports (PR#1123 was also on Pearson
residuals). However, the problem persists in the development version.
The issue is that we have (in 1.4.0pre)
pearson = (y - mu)/sqrt(wts * (object$family$variance)(mu))
However, the y and mu that we use are divided by n:
glm1$prior.weights
[1] 10 10 10 10
glm1$fitted.values
[1] 0.2802480 0.4834715 0.6923131 0.8439674
glm1$family$variance(glm1$fitted.values)
[1] 0.2017091 0.2497268 0.2130157 0.1316864
glm1$y
[1] 0.2 0.6 0.7 0.8
I've stuck my foot in it with this stuff before, but should the
denominator not be sqrt(object$family$variance(mu)/wts) ?? (If the
weights are large, the variance should be smaller, not larger).
Could someone please ram a wooden stake through this one and prevent
it from haunting us again?
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
R version 1.3.0
OS: SunOS 5.7, but I think the same problem occurs with Windows
An incorrect formula seems to be used to calculate the pearson residuals
for a generalized linear model with a binomial response. Here is a
simple program which gives (a) the pearson residuals calculated directly,
(b) the pearson residuals from glm, and (c) the deviance residuals from
glm. The first and last columns are quite close. The middle differs from
the first by a factor of n (n=10 here). This problem did not occur with
an earlier version of R I tried, version 0.64.2.
Yes, it is a bug, resulting from an incorrect fix to an earlier bug that
had the sign wrong for decreasing link functions like the reciprocal.
It's already been reported and fixed.
-thomas
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._