Dear R-devel list members,
This is an observation about how logical variables in models are handled, followed by questions.
As a general matter, character variables and logical variables are treated as if they were factors when they appear on the RHS of a model formula; for example:
- - - - snip- - - - -
set.seed(123)
c <- sample(letters[1:3], 10, replace=TRUE)
f <- as.factor(sample(LETTERS[1:3], 10, replace=TRUE))
L <- sample(c(TRUE, FALSE), 10, replace=TRUE)
y <- rnorm(10)
options(contrasts=c("contr.sum", "contr.poly"))
mod <- lm(y ~ c + f + L)
model.matrix(mod)
$c
[1] "a" "b" "c"
$f
[1] "A" "B" ?C"
- - - - snip- - - - -
Why the discrepancy? It?s true that the level-set (i.e., TRUE, FALSE) for a logical ?factor? is known, but examining the $levels component is a simple way to detect variables treated as factors in the model. For example, I?d argue that .getXlevels() returns misleading information:
- - - - snip- - - - -
.getXlevels(terms(mod), model.frame(mod))
$c
[1] "a" "b" "c"
$f
[1] "A" "B" ?C"
- - - - snip- - - - -
An alternative for detecting ?factors? is to examine the 'contrasts' attribute of the model matrix, although that doesn?t produce levels:
- - - - snip- - - - -
names(attr(model.matrix(mod), "contrasts"))
[1] "c" "f" "L"
- - - - snip- - - - -
Is there are argument against making the treatment of logical variables consistent with that of factors and character variables? Comments?
Best,
John
-------------------------------------------------
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: http::/socserv.mcmaster.ca/jfox