Skip to content

droplevels: drops contrasts as well

3 messages · Thaler, Thorn, LAUSANNE, Applied Mathematics, Thomas Lumley

#
Dear all,

Today I figured out that there is a neat function called droplevels,
which, well, drops unused levels in a data frame. I tried the function
with some of my data sets and it turned out that not only the unused
levels were dropped but also the contrasts I set via "C". I had a look
into the code, and this behaviour arises from the fact that droplevels
uses simply factor to drop the unused levels, which uses the default
contrasts as set by options("contrasts").

I think this behaviour is annoying, because if one does not look
carefully enough, one looses the contrasts silently. Hence may I suggest
to change the code of droplevels to something like the following:

droplevels <- function (x, except = NULL, ...) {
    ix <- vapply(x, is.factor, NA)
    if (!is.null(except)) 
        ix[except] <- FALSE
    co <- lapply(x[ix], function(fa) attr(fa, "contrasts"))
    x[ix] <- mapply(function(fa, co) {
      if (nlevels(factor(fa)) == 1) {
        factor(fa)
      } else {
        C(factor(fa), co)
      }
    }, x[ix], co, SIMPLIFY = FALSE)
    x
}

which keeps the original contrasts AND drops the unused levels?
Similarly, droplevels.factor should be changed to

droplevels.factor  <- function (x, ...) {
  co <- attr(x, "contrasts")
  if (nlevels(factor(x)) == 1) {
    factor(x)
  } else {
    C(factor(x), co)
  }
}

The nlevels statement is necessary since C does not work if there are
less than 2 levels.

Any comments appreciated.


KR,

-Thorn
3 days later
#
On Fri, Oct 21, 2011 at 5:57 AM, Thaler, Thorn, LAUSANNE, Applied
Mathematics <Thorn.Thaler at rdls.nestle.com> wrote:
This silently changes the contrasts -- eg, if the first level of the
factor is one of the empty levels, the reference level used by
contr.treatment() will change.  Also, if the contrasts are a matrix
rather than specifying a contrast function, the matrix will be invalid
for the the new factor.

I think just having a warning would be better -- in general it's not
clear what (if anything) it means to have the same contrasts on
factors with different numbers of levels.

   -thomas
#
Well, you are right and while I'm not so much concerned about the first
issue you've outlined (the change in the baseline - I think if I decide
to drop unused levels, I'm aware that a non-existing level cannot be the
baseline any more), the second point is clearly an issue I've
overlooked.
Would be an option. I think this should be the minimum. Still, I think a
behaviour like:
1.) if contrasts are defined as matrix issue a warning and use default
contrasts (that is nothing changes as compared to now, but that a
warning is issued)
2.) if the contrasts are defined as a function, use the function for
re-computing the contrasts.

would be more desirable, as contrasts can be seen as a general setting
of how coefficients should be interpreted too (e.g. for a balanced data
set with sum "contrasts", the intercept corresponds to the overall mean,
beta1 to the difference of the overall mean and group 1 and so on),
rather than looking at them from the literal point of view (e.g. "I want
to compare level A vs level B & C"). While from the latter point of view
I agree that the same contrasts on factors with different numbers of
levels are not really meaningful, I still see the benefit if I take the
other point of view: If I drop a level, I may be still interested in
comparing the overall mean with the group means bearing in mind that
maybe some groups are not present any more in the data set.

Do you see my point? However, it is not the biggest issue, as one can
change the contrasts rather easily oneself, but I think at least some
information/warning should be issued that the old contrasts are not used
any more.


KR,

-Thorn