Skip to content

Bug in update()? (PR#6902)

4 messages · Berwin A Turlach, Brian Ripley

#
Dear all,

I noticed the following while playing around with fitting log-linear
models to contingency tables using R 1.8.1, but the problem also
exists under R 1.9.0.

A reproducible example uses the following contingency table:
+                    Lrn=levels(Lrn), Age=levels(Age)))
+       data.frame(tmp,
+       Count=as.vector(tapply(rep(1,n), list(Eth, Sex, Lrn, Age), sum))))

First fit a saturated model and see which term we can drop:
    > fm <- glm(Count ~ .^4, quine2, family=poisson)
    > drop1(fm, test="Chisq")
    Single term deletions
    
    Model:
    Count ~ .^4
                    Df  Deviance     AIC     LRT Pr(Chi)
    <none>             6.661e-16 148.916                
    Eth:Sex:Lrn:Age  2     0.422 145.338   0.422    0.81

Drop the four-way interaction and check which term we can drop next:
    > fm <- update(fm, . ~ . - Eth:Sex:Lrn:Age)
    > drop1(fm, test="Chisq")
    Single term deletions
    
    Model:
    Count ~ .
           Df Deviance     AIC     LRT  Pr(Chi)   
    <none>      35.893 142.810                    
    Eth     1   36.332 141.248   0.439 0.507811   
    Sex     1   37.238 142.154   1.345 0.246237   
    Lrn     1   37.392 142.308   1.499 0.220842   
    Age     3   49.818 150.735  13.925 0.003009 **
    ---
    Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

All of a sudden there are only the main effects left and all
interaction terms have been removed.  I expected to see at this point
a test for all three-way interactions.  Since I started of with the
saturated model and used update to remove the four-way interaction,
all two-way and three-way interactions should still be in the model.
Or am I missing some subtlety of the "^" and "." operators in model
formulae? 

On the other hand, if we specify the saturated model as follows:

    > fm <- glm(Count ~ Lrn*Eth*Sex*Age, quine2, family=poisson)

then the above steps have the desired effect:

    > drop1(fm, test="Chisq")
    Single term deletions
    
    Model:
    Count ~ Lrn * Eth * Sex * Age
                    Df  Deviance     AIC     LRT Pr(Chi)
    <none>             4.219e-15 148.916                
    Lrn:Eth:Sex:Age  2     0.422 145.338   0.422    0.81
    > fm <- update(fm, . ~ . - Eth:Sex:Lrn:Age)
    > drop1(fm, test="Chisq")
    Single term deletions
    
    Model:
    Count ~ Lrn + Eth + Sex + Age + Lrn:Eth + Lrn:Sex + Eth:Sex + 
        Lrn:Age + Eth:Age + Sex:Age + Lrn:Eth:Sex + Lrn:Eth:Age + 
        Lrn:Sex:Age + Eth:Sex:Age
                Df Deviance     AIC     LRT  Pr(Chi)   
    <none>            0.422 145.338                    
    Lrn:Eth:Sex  1    0.493 143.409   0.071 0.789909   
    Lrn:Eth:Age  2    0.629 141.545   0.207 0.901471   
    Lrn:Sex:Age  2   12.728 153.644  12.306 0.002127 **
    Eth:Sex:Age  3    1.019 139.935   0.597 0.897103   
    ---
    Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

The update command only removed the four-way interaction but left all
the two-way and three-way interaction in the model.  

I looked at update.formula to see what the problem might be, but after
reading the help file of .Internal I stopped looking around.... :)

Cheers,

        Berwin

--please do not edit the information below--

Version:
 platform = i686-pc-linux-gnu
 arch = i686
 os = linux-gnu
 system = i686, linux-gnu
 status = 
 major = 1
 minor = 9.0
 year = 2004
 month = 04
 day = 12
 language = R

Search Path:
 .GlobalEnv, package:MASS, package:methods, package:stats, package:graphics, package:utils, Autoloads, package:base
3 days later
#
Berwin,

It's fortuitous that this works at all.

The problem is not in update but in glm itself: take a look at fm$formula
after the first fit.  fm$terms has the right formula, but formula() 
extracts the $formula component first.

I am not currently sure what the right fix is, so will not try to fix this 
in R-patched/1.9.1.

Brian
On Fri, 21 May 2004 berwin@maths.uwa.edu.au wrote:

            

  
    
1 day later
#
Dear Brian,
BDR> The problem is not in update but in glm itself: take a look
    BDR> at fm$formula after the first fit.  fm$terms has the right
    BDR> formula, but formula() extracts the $formula component first.
O.k., I see what you mean.

    BDR> I am not currently sure what the right fix is, so will not
    BDR> try to fix this in R-patched/1.9.1.
This may be naive, but I noticed the following at the end of the glm
function (starting line 76, in src/library/stats/R/glm.R):

    fit <- c(fit, list(call = call, formula = formula,
		       terms = mt, data = data,
		       offset = offset, control = control, method = method,
		       contrasts = attr(X, "contrasts"),
                       xlevels = .getXlevels(mt, mf)))

That is, the component "formula" in the object that is returned is
explicitly set to the "formula" argument with which glm was called.
The function "lm" does not do this.  And if I change this `line' to

    fit <- c(fit, list(call = call, 
		       terms = mt, data = data,
		       offset = offset, control = control, method = method,
		       contrasts = attr(X, "contrasts"),
                       xlevels = .getXlevels(mt, mf)))

then fm$formula for the first fit in my example looks o.k.  

But I don't know if this will break anything else.

Cheers,

        Berwin

PS:  I noticed that both `glm' and 'lm' now do essentially the
following manipulations early on:

    mf <- match.call(expand.dots = FALSE)
    m <- match(c("formula", "data", "subset", "weights", "na.action", 
        "offset"), names(mf), 0)
    mf <- mf[c(1, m)]
    mf$drop.unused.levels <- TRUE
    mf[[1]] <- as.name("model.frame")
    mf <- eval(mf, parent.frame())

Is this the preferred idiom to be used in R 1.9.x and later for
modelling functions?
#
On Wed, 26 May 2004, Berwin A Turlach wrote:

            
You won't have a formula component at all (lm objects do not).
What I have done is alter the formula methods to pick up the formula from 
the terms component but the environment from the formula componet (if 
present).
Rather than removing components, by setting to NULL, yes.