Skip to content
Prev 78473 / 398502 Next

factor : how does it work ?

On 10/6/2005 9:14 AM, Florence Combes wrote:
This is described in the ?data.frame man page:  "Character variables 
passed to 'data.frame' are converted to factor columns unless protected 
by 'I'."
levels() is not the conversion you want.  That lists all the levels, but 
it doesn't tell you how they correspond to individual observations.  For 
example,

 > df <- data.frame(x=1:3, y=c('a','b','a'))
 > df
   x y
1 1 a
2 2 b
3 3 a
 > levels(df$y)
[1] "a" "b"

If you need to convert back to character values, use as.character():

 > as.character(df$y)
[1] "a" "b" "a"

For many purposes, you can ignore the fact that your data is stored as a 
factor instead of a character vector.  There are a few differences:

  1. You can't compare the levels of a factor unless you declared it to 
be ordered:

 > df$y[1] > df$y[2]
[1] NA
Warning message:
 > not meaningful for factors in: Ops.factor(df$y[1], df$y[2])

but

 > df$y <- ordered(df$y)
 > df$y[1] > df$y[2]
[1] FALSE

However, you need to watch out here: the comparison is done by the order 
of the factors, not an alphabetic comparison of their names:

 > levels(df$y) <- c("before", "after")
 > df
   x      y
1 1 before
2 2  after
3 3 before
 > df$y[1] > df$y[2]
[1] FALSE


  2. as.integer() works differently on factors:  it gets the position in 
the levels vector.  For example,

 > as.integer(df$y)
[1] 1 2 1
 > as.integer(as.character(df$y))
[1] NA NA NA
Warning message:
NAs introduced by coercion

There are other differences, but these are the two main ones that are 
likely to cause you trouble.

Duncan Murdoch