In R1.7 and above (including R 1.9 alpha), 'update.formula' forgets to copy any offset(...) term in the original '.' formula: test> df <- data.frame( x=1:4, y=sqrt( 1:4), z=c(2:4,1)) test> fit1 <- glm( y~offset(x)+z, data=df) test> fit1$call glm(formula = y ~ offset(x) + z, data = df) test> fit1u <- update( fit1, ~.) test> fit1u$call glm(formula = y ~ z, data = df) The problem occurs when 'update.formula' calls 'terms.formula(..., simplify=TRUE)' which defines and calls a function 'fixFormulaObject'. The first line of 'fixFormulaObject' attempts to extract the contents of the RHS of the formula via tmp <- attr(terms(object), "term.labels") but this omits any offsets. Replacing that line with the following, which I think pulls in everything except the response, *seems* to fix the problem without disrupting the guts of 'terms' itself: tmp <- dimnames( attr(terms(object), "factors"))[[1]][ -attr( terms, 'response')] The suggested line might be simpler than checking the 'offset' component of 'terms(object)', which won't always exist. Footnote: strange things happen when there is more than one offset (OK, there probably shouldn't be, but I thought I'd experiment): test> fit2 <- glm( y ~ offset( x) + offset( log( x)) + z, data=df) test> fit2$call glm(formula = y ~ offset(x) + offset(log(x)) + z, data = df) test> fit2u <- update( fit2, ~.) test> fit2u$call glm(formula = y ~ offset(log(x)) + z, data = df) Curiously, the 'term.labels' attribute of 'terms(object)' now includes the second offset, but not the first. ******************************* Mark Bravington CSIRO (CMIS) PO Box 1538 Castray Esplanade Hobart TAS 7001 phone (61) 3 6232 5118 fax (61) 3 6232 5012 Mark.Bravington@csiro.au --please do not edit the information below-- Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = major = 1 minor = 8.1 year = 2003 month = 11 day = 21 language = R Windows 2000 Professional (build 2195) Service Pack 4.0 Search Path: .GlobalEnv, ROOT, package:methods, package:ctest, package:mva, package:modreg, package:nls, package:ts, package:chstuff, package:handy2, package:handy, package:debug, mvb.session.info, package:mvbutils, package:tcltk, Autoloads, package:base
update forgets about offset() (PR#6656)
3 messages · Mark Bravington, Brian Ripley
1 day later
On Tue, 9 Mar 2004 Mark.Bravington@csiro.au wrote:
In R1.7 and above (including R 1.9 alpha), 'update.formula' forgets to copy any offset(...) term in the original '.' formula: test> df <- data.frame( x=1:4, y=sqrt( 1:4), z=c(2:4,1)) test> fit1 <- glm( y~offset(x)+z, data=df) test> fit1$call glm(formula = y ~ offset(x) + z, data = df) test> fit1u <- update( fit1, ~.) test> fit1u$call glm(formula = y ~ z, data = df) The problem occurs when 'update.formula' calls 'terms.formula(..., simplify=TRUE)' which defines and calls a function 'fixFormulaObject'. The first line of 'fixFormulaObject' attempts to extract the contents of the RHS of the formula via tmp <- attr(terms(object), "term.labels") but this omits any offsets. Replacing that line with the following, which I think pulls in everything except the response, *seems* to fix the problem without disrupting the guts of 'terms' itself: tmp <- dimnames( attr(terms(object), "factors"))[[1]][ -attr( terms, 'response')] The suggested line might be simpler than checking the 'offset' component of 'terms(object)', which won't always exist.
Sorry, but that is a common programming error. The possible values of attr(terms, "response") are 0 or 1 (although code should not rely on the non-existence of 2, 3, ...). foo[-0] == foo[0] is a length-0 vector. Also, in R please use rownames(): it is easier to read and safer.
Footnote: strange things happen when there is more than one offset (OK, there probably shouldn't be, but I thought I'd experiment):
That is allowed, and works in general.
test> fit2 <- glm( y ~ offset( x) + offset( log( x)) + z, data=df) test> fit2$call glm(formula = y ~ offset(x) + offset(log(x)) + z, data = df) test> fit2u <- update( fit2, ~.) test> fit2u$call glm(formula = y ~ offset(log(x)) + z, data = df) Curiously, the 'term.labels' attribute of 'terms(object)' now includes the second offset, but not the first.
The issue here is the code to remove offset terms fails if two successive terms are offsets, but not otherwise.
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Wed, 10 Mar 2004, Prof Brian Ripley wrote:
On Tue, 9 Mar 2004 Mark.Bravington@csiro.au wrote:
In R1.7 and above (including R 1.9 alpha), 'update.formula' forgets to copy any offset(...) term in the original '.' formula: test> df <- data.frame( x=1:4, y=sqrt( 1:4), z=c(2:4,1)) test> fit1 <- glm( y~offset(x)+z, data=df) test> fit1$call glm(formula = y ~ offset(x) + z, data = df) test> fit1u <- update( fit1, ~.) test> fit1u$call glm(formula = y ~ z, data = df) The problem occurs when 'update.formula' calls 'terms.formula(..., simplify=TRUE)' which defines and calls a function 'fixFormulaObject'. The first line of 'fixFormulaObject' attempts to extract the contents of the RHS of the formula via tmp <- attr(terms(object), "term.labels") but this omits any offsets. Replacing that line with the following, which I think pulls in everything except the response, *seems* to fix the problem without disrupting the guts of 'terms' itself: tmp <- dimnames( attr(terms(object), "factors"))[[1]][ -attr( terms, 'response')] The suggested line might be simpler than checking the 'offset' component of 'terms(object)', which won't always exist.
Sorry, but that is a common programming error. The possible values of attr(terms, "response") are 0 or 1 (although code should not rely on the non-existence of 2, 3, ...). foo[-0] == foo[0] is a length-0 vector. Also, in R please use rownames(): it is easier to read and safer.
There is a second level of problems. The rownames include all terms, even those with - signs, so that code would collapse y ~ x + z - z to y ~ x + z!
Footnote: strange things happen when there is more than one offset (OK, there probably shouldn't be, but I thought I'd experiment):
That is allowed, and works in general.
test> fit2 <- glm( y ~ offset( x) + offset( log( x)) + z, data=df) test> fit2$call glm(formula = y ~ offset(x) + offset(log(x)) + z, data = df) test> fit2u <- update( fit2, ~.) test> fit2u$call glm(formula = y ~ offset(log(x)) + z, data = df) Curiously, the 'term.labels' attribute of 'terms(object)' now includes the second offset, but not the first.
The issue here is the code to remove offset terms fails if two successive terms are offsets, but not otherwise.
It fact, only if the two successive offsets were first or last for two separate reasons, which made it hard to track down. I have now committed patches for both problems.
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595