inconsistent handling of factor, character, and logical predictors in lm()

John Fox

Fri, Aug 30, 2019 11:11 AM

Dear R-devel list members,

I've discovered an inconsistency in how lm() and similar functions handle logical predictors as opposed to factor or character predictors. An "lm" object for a model that includes factor or character predictors includes the levels of a factor or unique values of a character predictor in the $xlevels component of the object, but not the FALSE/TRUE values for a logical predictor even though the latter is treated as a factor in the fit.

For example:

------------ snip --------------

$Species
[1] "setosa"     "versicolor" "virginica"

$`as.character(Species)`
[1] "setosa"     "versicolor" "virginica"

named list()

Call:
lm(formula = Sepal.Length ~ Sepal.Width + I(Species == "setosa"), 
    data = iris)

Coefficients:
               (Intercept)                 Sepal.Width  I(Species == "setosa")TRUE  
                    3.5571                      0.9418                     -1.7797  

------------ snip --------------

I believe that the culprit is .getXlevels(), which makes provision for factor and character predictors but not for logical predictors:

------------ snip --------------

function (Terms, m) 
{
    xvars <- vapply(attr(Terms, "variables"), deparse2, 
        "")[-1L]
    if ((yvar <- attr(Terms, "response")) > 0) 
        xvars <- xvars[-yvar]
    if (length(xvars)) {
        xlev <- lapply(m[xvars], function(x) if (is.factor(x)) 
            levels(x)
        else if (is.character(x)) 
            levels(as.factor(x)))
        xlev[!vapply(xlev, is.null, NA)]
    }
}

------------ snip --------------

It would be simple to modify the last test in .getXlevels to 

	else if (is.character(x) || is.logical(x))

which would cause .getXlevels() to return c("FALSE", "TRUE") (assuming both values are present in the data). I'd find that sufficient, but alternatively there could be a separate test for logical predictors that returns c(FALSE, TRUE).

I discovered this issue when a function in the effects package failed for a model with a logical predictor. Although it's possible to program around the problem, I think that it would be better to handle factors, character predictors, and logical predictors consistently.

Best,
 John

--------------------------------------
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: socialsciences.mcmaster.ca/jfox/

inconsistent handling of factor, character, and logical predictors in lm()

Thread (4 messages)