Skip to content
Prev 167708 / 398502 Next

Handling of factors

Thomas Lumley wrote:
It might be worth noting here that in the second variation, the set will
have to be ordered for pragmatic reasons (order of entries in tables,
contrast matrices, etc.) even for non-ordered factors. So you can always
_define_ the integer codes. In that light, you could say that it is only
a matter of making the conventions consistent as to whether factors are
character-like or integer-like.
S3-style object-orientation and coercion rules also played their part:
It was easy to code a group method for "==" so that sex=="male" works
and sex==1 does not (unless levels(sex) include "1"), but in the "["
operator we have automatic unclass() of the index (with S3, you can
dispatch on what class of object you index, but not what you index
with), so that

plot(x,y, col=c(male="lightblue", female="pink")[sex])

will _not_ do character indexing, and may well give the opposite result
of what it looks like. We could change the convention here (coerce
factor to character), but there are a couple of demons: What if the
object you are indexing does not have names or has incompatible names,
and would there not be a performance hit? Also, the law of inertia: The
existing conventions have been used for quite a while, so changing them
could break code in unexpected places.

Notice, by the way, that in comparison operations between (ordered)
factor and character, it is the character that is coerced to a factor,
not the other way around: cooked <= "medium" should include "rare"...