Skip to content

how to suppress the intercept in an lm()-like formula method?

8 messages · Michael Friendly, Kevin E. Thorpe, Ken Knoblauch +2 more

#
I'm trying to write a formula method for canonical correlation analysis, 
that could be called similarly to lm() for
a multivariate response:

cancor(cbind(y1,y2,y3) ~ x1+x2+x3+x4, data=, ...)
or perhaps more naturally,
cancor(cbind(y1,y2,y3) ~ cbind(x1,x2,x3,x4), data=, ...)

I've adapted the code from lm() to my case, but in this situation, it 
doesn't make sense to
include an intercept, since X & Y are mean centered by default in the 
computation.

In the code below, I can't see where the intercept gets included in the 
model matrix and therefore
how to suppress it.  There is a test case at the end, showing that the 
method fails when called
normally, but works if I explicitly use -1 in the formula.  I could hack 
the result of model.matrix(),
but maybe there's an easier way?

cancor <- function(x, ...) {
     UseMethod("cancor", x)
}

cancor.default <- candisc:::cancor

# TODO: make cancisc::cancor() use x, y, not X, Y
cancor.formula <- function(formula, data, subset, weights,
         na.action,
         method = "qr",
     model = TRUE,
     x = FALSE, y = FALSE, qr = TRUE,
     contrasts = NULL, ...) {

     cl <- match.call()
     mf <- match.call(expand.dots = FALSE)
     m <- match(c("formula", "data", "subset", "weights", "na.action"), 
names(mf), 0L)
     mf <- mf[c(1L, m)]

     mf[[1L]] <- as.name("model.frame")
     mf <- eval(mf, parent.frame())

     mt <- attr(mf, "terms")
     y <- model.response(mf, "numeric")
     w <- as.vector(model.weights(mf))
     if (!is.null(w) && !is.numeric(w))
         stop("'weights' must be a numeric vector")

     x <- model.matrix(mt, mf, contrasts)
     # fixup to remove intercept???
     z <- if (is.null(w))
         cancor.default(x, y,  ...)
     else stop("weights are not yet implemented")  # lm.wfit(x, y, w,  ...)

     z$call <- cl
     z$terms <- mt
         z
}

TESTME <- FALSE
if (TESTME) {

# need to get latest version, 0.6-1 from R-Forge
install.packages("candisc", repo="http://R-Forge.R-project.org")
library(candisc)

data(Rohwer)

# this bombs: needs intercept removed
cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~  n + s + ns + na + ss, 
data=Rohwer)
## Error in chol.default(Rxx) :
##  the leading minor of order 1 is not positive definite

#this works as is
cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~  -1 + n + s + ns + na + 
ss, data=Rohwer)
cc
## Canonical correlation analysis of:
##       5   X  variables:  n, s, ns, na, ss
##   with        3   Y  variables:  SAT, PPVT, Raven
##
##     CanR  CanRSQ   Eigen percent    cum
## 1 0.6703 0.44934 0.81599   77.30  77.30
## 2 0.3837 0.14719 0.17260   16.35  93.65
## 3 0.2506 0.06282 0.06704    6.35 100.00
##
## Test of H0: The canonical correlations in the
## current row and all that follow are zero
##
   ...
}
#
To partly answer my own question:  It wasn't that hard to hack the 
result of model.matrix() to remove the intercept,

remove.intercept <- function(x) {
	if (colnames(x)[1] == "(Intercept)") {
		x <- x[,-1]
		attr(x, "assign") <- attr(x, "assign")[-1]
	}
	x
}

However, the model frame and therefore the model terms stored in the 
object are wrong, still including the intercept:

Browse[1]> str(mt)
Classes 'terms', 'formula' length 3 cbind(SAT, PPVT, Raven) ~ n + s + ns 
+ na + ss
   ..- attr(*, "variables")= language list(cbind(SAT, PPVT, Raven), n, 
s, ns, na, ss)
   ..- attr(*, "factors")= int [1:6, 1:5] 0 1 0 0 0 0 0 0 1 0 ...
   .. ..- attr(*, "dimnames")=List of 2
   .. .. ..$ : chr [1:6] "cbind(SAT, PPVT, Raven)" "n" "s" "ns" ...
   .. .. ..$ : chr [1:5] "n" "s" "ns" "na" ...
   ..- attr(*, "term.labels")= chr [1:5] "n" "s" "ns" "na" ...
   ..- attr(*, "order")= int [1:5] 1 1 1 1 1
   ..- attr(*, "intercept")= int 1
   ..- attr(*, "response")= int 1
   ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
   ..- attr(*, "predvars")= language list(cbind(SAT, PPVT, Raven), n, s, 
ns, na, ss)
   ..- attr(*, "dataClasses")= Named chr [1:6] "nmatrix.3" "numeric" 
"numeric" "numeric" ...
   .. ..- attr(*, "names")= chr [1:6] "cbind(SAT, PPVT, Raven)" "n" "s" 
"ns" ...
Browse[1]>
On 1/29/2013 8:44 AM, Michael Friendly wrote:

  
    
#
For lm() the intercept can be removed by adding a "- 1" to the RHS of 
the formula.  Does that not work in your case?

Kevin
On 01/29/2013 09:14 AM, Michael Friendly wrote:

  
    
#
Michael Friendly <friendly <at> yorku.ca> writes:
Have a look at the polr function in MASS where this same problem
is handled, I think, around lines 10 - 15, so right near the beginning,
starting where the variable xint is defined.

best,

Ken
#
Hi Michael,

How about,

    x <- x[, colnames(x) != "(Intercept)"]

I hope this helps,
 John
#
On 29/01/2013 9:14 AM, Michael Friendly wrote:
I think you need to do some of the low level calculations yourself, 
specifically the "terms" calculation.

For example, this forces no intercept, regardless of what the user 
specifies.  You might prefer just to change the default to no intercept 
and allow the user to use "+1" to add one, which looks harder...

# .... set formula to the user's formula ...
# Now modify it to suppress the intercept:

class(formula) <- c("nointercept", class(formula))
terms.nointercept <- function(x, ...) {
   result <- NextMethod(x, ...)
   attr(result, "intercept") <- 0
   result
}

Now lm(formula) does a fit with no intercept.

Duncan Murdoch
#
On 1/29/2013 9:25 AM, Duncan Murdoch wrote:
That's very clever.  I think I can use this locally in my function. 
However, it is still a mystery to me *where* in the process terms()
gets called.  The main incantation for model formulae in lm() is below, 
so who calls terms()?

     cl <- match.call()
     mf <- match.call(expand.dots = FALSE)
     m <- match(c("formula", "data", "subset", "weights", "na.action"), 
names(mf), 0L)
     mf <- mf[c(1L, m)]

     mf[[1L]] <- as.name("model.frame")
     mf <- eval(mf, parent.frame())

     mt <- attr(mf, "terms")

It is also a mystery to me where na.action takes its effect. Presumably 
this is somewhere before lm.fit(x, y, ...)
gets called, but I'd like to take account of this in my cancor.default 
method.

-Michael

  
    
#
On 29/01/2013 10:21 AM, Michael Friendly wrote:
This line calls model.frame, which is generic.  You're passing a formula 
to it, and there is no formula method, so the default method gets 
called.  model.frame.default() does lots of stuff, and then calls 
terms(formula).  That's where my modification sneaks in.

model.frame.default() also appears to be doing stuff with na.action, but 
I haven't traced through to see exactly how much gets done.

Duncan Murdoch