glm start/offset bugs (PR#1422)
--fupGvOGOQM Content-Type: text/plain; charset=us-ascii Content-Description: message body and .signature Content-Transfer-Encoding: 7bit There's a simple bug in the handling of the start and offset arguments in glm and glm.fit. The bug exists in the latest development version of R (version information below), but it appears that glm.R has not been touched much lately, so the bug affects at least the most recent stable release of R. Here is a simple example:
data(ships, package=MASS) ships.glm <- glm(incidents ~ type + year + period + offset(log(service)),
+ family=poisson, data=ships, subset=service != 0)
glm(incidents ~ type + year + period + offset(log(service)),
+ family=poisson, data=ships, subset=service != 0, start=coef(ships.glm)) or more simply
update(ships.glm, start=coef(ships.glm))
The problem is caused by a bad initialization of etastart in glm.fit() when an offset is present. Fixing this and retrying the example above then reveals another bug, this time in glm(), that is tickled only when non-NULL values are given to both the offset and start arguments. I've attached to the bottom of this message a simple patch to src/library/base/R/glm.R that I believe correctly repairs these bugs. Essentially the same change is needed in Berwin Turlach's Wed, 27 Feb 2002 version of glm.fit() reported on the R bug tracking site as "Models/1331". (The change to glm() is still required as well of course.) As an aside, I'm wondering if there has been any thought given to adopting Berwin's patches. Though I must admit that I haven't taken the time to check carefully through his changes, I know Berwin to be a very careful and skillful programmer, and I suspect that he is the only person to have looked very closely at those particular details of glm.fit in recent months (years?). Also, some of the simpler changes that he made seem to be needed just to cleanup the code. Would it be useful if a number of us began using his version as an informal test? After discovering this bug earlier today, I have already begun to do so and so far have not encountered any problems.
Brett Presnell Department of Statistics University of Florida http://www.stat.ufl.edu/~presnell/ Version: platform = i686-pc-linux-gnu arch = i686 os = linux-gnu system = i686, linux-gnu status = Under development (unstable) major = 1 minor = 5.0 year = 2002 month = 03 day = 27 language = R Search Path: .GlobalEnv, package:ctest, Autoloads, package:base --fupGvOGOQM Content-Type: text/plain Content-Description: glm patch Content-Disposition: inline; filename="glm.patch" Content-Transfer-Encoding: 7bit *** glm.R Wed Feb 27 01:11:03 2002 --- glm-patched.R Thu Mar 28 21:09:24 2002 *************** *** 67,71 **** if(is.empty.model(mt)) fit$deviance else glm.fit(x=X[,"(Intercept)",drop=FALSE], y=Y, weights=weights, ! start=start, offset=offset, family=family, control=control, intercept=TRUE)$deviance } --- 67,71 ---- if(is.empty.model(mt)) fit$deviance else glm.fit(x=X[,"(Intercept)",drop=FALSE], y=Y, weights=weights, ! offset=offset, family=family, control=control, intercept=TRUE)$deviance } *************** *** 152,156 **** "and correspond to initial coefs for", deparse(xnames))) ! else as.vector(if (NCOL(x) == 1) x * start else x %*% start) else family$linkfun(mustart) mu <- linkinv(eta) --- 152,157 ---- "and correspond to initial coefs for", deparse(xnames))) ! else offset + ! as.vector(if (NCOL(x) == 1) x * start else x %*% start) else family$linkfun(mustart) mu <- linkinv(eta) --fupGvOGOQM-- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._