Skip to content

update forgets about offset() (PR#6656)

3 messages · Mark Bravington, Brian Ripley

#
In R1.7 and above (including R 1.9 alpha), 'update.formula' forgets to copy any offset(...) term in the original '.' formula:

test> df <- data.frame( x=1:4, y=sqrt( 1:4), z=c(2:4,1))
test> fit1 <- glm( y~offset(x)+z, data=df)
test> fit1$call
glm(formula = y ~ offset(x) + z, data = df)

test> fit1u <- update( fit1, ~.)
test> fit1u$call
glm(formula = y ~ z, data = df)


The problem occurs when 'update.formula' calls 'terms.formula(..., simplify=TRUE)' which defines and calls a function 'fixFormulaObject'. The first line of 'fixFormulaObject' attempts to extract the contents of the RHS of the formula via 

tmp <- attr(terms(object), "term.labels")

but this omits any offsets. Replacing that line with the following, which I think pulls in everything except the response, *seems* to fix the problem without disrupting the guts of 'terms' itself:

tmp <- dimnames( attr(terms(object), "factors"))[[1]][ -attr( terms, 'response')]

The suggested line might be simpler than checking the 'offset' component of 'terms(object)', which won't always exist.

Footnote: strange things happen when there is more than one offset (OK, there probably shouldn't be, but I thought I'd experiment):

test> fit2 <- glm( y ~ offset( x) + offset( log( x)) + z, data=df)
test> fit2$call
glm(formula = y ~ offset(x) + offset(log(x)) + z, data = df)

test> fit2u <- update( fit2, ~.)
test> fit2u$call
glm(formula = y ~ offset(log(x)) + z, data = df)

Curiously, the 'term.labels' attribute of 'terms(object)' now includes the second offset, but  not the first.


*******************************

Mark Bravington
CSIRO (CMIS)
PO Box 1538
Castray Esplanade
Hobart
TAS 7001

phone (61) 3 6232 5118
fax (61) 3 6232 5012
Mark.Bravington@csiro.au 

--please do not edit the information below--

Version:
 platform = i386-pc-mingw32
 arch = i386
 os = mingw32
 system = i386, mingw32
 status = 
 major = 1
 minor = 8.1
 year = 2003
 month = 11
 day = 21
 language = R

Windows 2000 Professional (build 2195) Service Pack 4.0

Search Path:
 .GlobalEnv, ROOT, package:methods, package:ctest, package:mva, package:modreg, package:nls, package:ts, package:chstuff, package:handy2, package:handy, package:debug, mvb.session.info, package:mvbutils, package:tcltk, Autoloads, package:base
1 day later
#
On Tue, 9 Mar 2004 Mark.Bravington@csiro.au wrote:

            
Sorry, but that is a common programming error.  The possible values of
attr(terms, "response") are 0 or 1 (although code should not rely on the 
non-existence of 2, 3, ...).  foo[-0] == foo[0] is a length-0 vector.

Also, in R please use rownames(): it is easier to read and safer.
That is allowed, and works in general.
The issue here is the code to remove offset terms fails if two successive 
terms are offsets, but not otherwise.
#
On Wed, 10 Mar 2004, Prof Brian Ripley wrote:

            
There is a second level of problems.  The rownames include all terms, even 
those with - signs, so that code would collapse

y ~ x + z - z

to y ~ x + z!
It fact, only if the two successive offsets were first or last for two 
separate reasons, which made it hard to track down.

I have now committed patches for both problems.