logical variables in models - R-devel

John Fox

Wed, Dec 19, 2018 7:19 AM #

Dear R-devel list members,

This is an observation about how logical variables in models are handled, followed by questions.

As a general matter, character variables and logical variables are treated as if they were factors when they appear on the RHS of a model formula; for example:

- - - - snip- - - - -

(Intercept) c1 c2 f1 f2 L1
1            1  1  0 -1 -1  1
2            1 -1 -1  0  1  1
3            1  0  1 -1 -1  1
4            1 -1 -1  0  1  1
5            1 -1 -1  1  0  1
6            1  1  0 -1 -1  1
7            1  0  1  1  0  1
8            1 -1 -1  1  0  1
9            1  0  1  1  0 -1
10           1  0  1 -1 -1 -1
attr(,"assign")
[1] 0 1 1 2 2 3
attr(,"contrasts")
attr(,"contrasts")$c
[1] "contr.sum"

attr(,"contrasts")$f
[1] "contr.sum"

attr(,"contrasts")$L
[1] ?contr.sum"

- - - - snip- - - - -

But logical variables don?t appear in the $xlevels component of the objects created by lm() and similar functions:

- - - - snip- - - - -

$c
[1] "a" "b" "c"

$f
[1] "A" "B" ?C"

- - - - snip- - - - -

Why the discrepancy? It?s true that the level-set (i.e., TRUE, FALSE) for a logical ?factor? is known, but examining the $levels component is a simple way to detect variables treated as factors in the model. For example, I?d argue that .getXlevels() returns misleading information:

- - - - snip- - - - -

$c
[1] "a" "b" "c"

$f
[1] "A" "B" ?C"

- - - - snip- - - - -

An alternative for detecting ?factors? is to examine the 'contrasts' attribute of the model matrix, although that doesn?t produce levels:

- - - - snip- - - - -

[1] "c" "f" "L"

- - - - snip- - - - -

Is there are argument against making the treatment of logical variables consistent with that of factors and character variables? Comments?

Best,
 John

  -------------------------------------------------
  John Fox, Professor Emeritus
  McMaster University
  Hamilton, Ontario, Canada
  Web: http::/socserv.mcmaster.ca/jfox