how to suppress the intercept in an lm()-like formula method?

Michael Friendly · 2013-01-29T15:21:58Z

On 1/29/2013 9:25 AM, Duncan Murdoch wrote: > On 29/01/2013 9:14 AM, Michael Friendly wrote: >> To partly answer my own question: It wasn't that hard to hack the >> result of model.matrix() to remove the intercept, >> >> remove.intercept > if (colnames(x)[1] == "(Intercept)") { >> x > attr(x, "assign") > } >> x >> } > > I think you need to do some of the low level calculations yourself, > specifically the "terms

Michael Friendly

Tue, Jan 29, 2013 7:21 AM

On 1/29/2013 9:25 AM, Duncan Murdoch wrote:

That's very clever.  I think I can use this locally in my function. 
However, it is still a mystery to me *where* in the process terms()
gets called.  The main incantation for model formulae in lm() is below, 
so who calls terms()?

     cl <- match.call()
     mf <- match.call(expand.dots = FALSE)
     m <- match(c("formula", "data", "subset", "weights", "na.action"), 
names(mf), 0L)
     mf <- mf[c(1L, m)]

     mf[[1L]] <- as.name("model.frame")
     mf <- eval(mf, parent.frame())

     mt <- attr(mf, "terms")

It is also a mystery to me where na.action takes its effect. Presumably 
this is somewhere before lm.fit(x, y, ...)
gets called, but I'd like to take account of this in my cancor.default 
method.

-Michael

Now lm(formula) does a fit with no intercept.

Duncan Murdoch

However, the model frame and therefore the model terms stored in the
object are wrong, still including the intercept:

Browse[1]> str(mt)
Classes 'terms', 'formula' length 3 cbind(SAT, PPVT, Raven) ~ n + s + ns
+ na + ss
    ..- attr(*, "variables")= language list(cbind(SAT, PPVT, Raven), n,
s, ns, na, ss)
    ..- attr(*, "factors")= int [1:6, 1:5] 0 1 0 0 0 0 0 0 1 0 ...
    .. ..- attr(*, "dimnames")=List of 2
    .. .. ..$ : chr [1:6] "cbind(SAT, PPVT, Raven)" "n" "s" "ns" ...
    .. .. ..$ : chr [1:5] "n" "s" "ns" "na" ...
    ..- attr(*, "term.labels")= chr [1:5] "n" "s" "ns" "na" ...
    ..- attr(*, "order")= int [1:5] 1 1 1 1 1
    ..- attr(*, "intercept")= int 1
    ..- attr(*, "response")= int 1
    ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
    ..- attr(*, "predvars")= language list(cbind(SAT, PPVT, Raven), 
n, s,
ns, na, ss)
    ..- attr(*, "dataClasses")= Named chr [1:6] "nmatrix.3" "numeric"
"numeric" "numeric" ...
    .. ..- attr(*, "names")= chr [1:6] "cbind(SAT, PPVT, Raven)" "n" "s"
"ns" ...
Browse[1]>


On 1/29/2013 8:44 AM, Michael Friendly wrote:

I'm trying to write a formula method for canonical correlation

analysis,

that could be called similarly to lm() for
a multivariate response:

cancor(cbind(y1,y2,y3) ~ x1+x2+x3+x4, data=, ...)
or perhaps more naturally,
cancor(cbind(y1,y2,y3) ~ cbind(x1,x2,x3,x4), data=, ...)

I've adapted the code from lm() to my case, but in this situation, it
doesn't make sense to
include an intercept, since X & Y are mean centered by default in the
computation.

In the code below, I can't see where the intercept gets included in

the

model matrix and therefore
how to suppress it.  There is a test case at the end, showing that the
method fails when called
normally, but works if I explicitly use -1 in the formula. I could

hack

the result of model.matrix(),
but maybe there's an easier way?

cancor <- function(x, ...) {
     UseMethod("cancor", x)
}

cancor.default <- candisc:::cancor

# TODO: make cancisc::cancor() use x, y, not X, Y
cancor.formula <- function(formula, data, subset, weights,
         na.action,
         method = "qr",
     model = TRUE,
     x = FALSE, y = FALSE, qr = TRUE,
     contrasts = NULL, ...) {

     cl <- match.call()
     mf <- match.call(expand.dots = FALSE)
     m <- match(c("formula", "data", "subset", "weights",

"na.action"),

names(mf), 0L)
     mf <- mf[c(1L, m)]

     mf[[1L]] <- as.name("model.frame")
     mf <- eval(mf, parent.frame())

     mt <- attr(mf, "terms")
     y <- model.response(mf, "numeric")
     w <- as.vector(model.weights(mf))
     if (!is.null(w) && !is.numeric(w))
         stop("'weights' must be a numeric vector")

     x <- model.matrix(mt, mf, contrasts)
     # fixup to remove intercept???
     z <- if (is.null(w))
         cancor.default(x, y,  ...)
     else stop("weights are not yet implemented")  # lm.wfit(x, y,

w,  ...)

     z$call <- cl
     z$terms <- mt
         z
}

TESTME <- FALSE
if (TESTME) {

# need to get latest version, 0.6-1 from R-Forge
install.packages("candisc", repo="http://R-Forge.R-project.org")
library(candisc)

data(Rohwer)

# this bombs: needs intercept removed
cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~  n + s + ns + na + ss,
data=Rohwer)
## Error in chol.default(Rxx) :
##  the leading minor of order 1 is not positive definite

#this works as is
cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~  -1 + n + s + ns + na +
ss, data=Rohwer)
cc
## Canonical correlation analysis of:
##       5   X  variables:  n, s, ns, na, ss
##   with        3   Y  variables:  SAT, PPVT, Raven
##
##     CanR  CanRSQ   Eigen percent    cum
## 1 0.6703 0.44934 0.81599   77.30  77.30
## 2 0.3837 0.14719 0.17260   16.35  93.65
## 3 0.2506 0.06282 0.06704    6.35 100.00
##
## Test of H0: The canonical correlations in the
## current row and all that follow are zero
##
   ...
}

Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

how to suppress the intercept in an lm()-like formula method?

Thread (8 messages)