Skip to content

Characters vs. factors

2 messages · Hadley Wickham, David Smith

#
It seems like a recent trend in R has been to make character vectors
and factors almost equivalent (apart from the way that factors always
remember their original range).  There are a few exceptions:

 * summary.character != summary.factor
 * table(x, exclude = NULL) != table(factor(x), exclude=NULL) when x
includes missing values

 * strsplit on a factor
Error in strsplit(factor(c("a", "a b")), " ") : non-character argument

 * nchar on a factor:
[1] 1 1 1

 * : with two character strings
Error in "a":"b" : NA/NaN argument
In addition: Warning messages:
1: NAs introduced by coercion
2: NAs introduced by coercion
[1] a:b
Levels: a:b

Regards,

Hadley
#
On Mon, Oct 5, 2009 at 4:33 PM, hadley wickham <h.wickham at gmail.com> wrote:
A related issue is that modeling functions throw a warning when
character objects are used in place of factors:
Warning message:
In model.matrix.default(mt, mf, contrasts) :
  variable 'Payment' converted to a factor

The warning doesn't affect R's behaviour, of course, but it does make
it difficult to sanction the otherwise sensible advice to R beginners
to read in data files with as.it=TRUE. (The warning leads to
difficult-to-answer questions.) For similar reasons  I deleted the
warning from this post:
http://blog.revolution-computing.com/2009/09/is-the-express-line-really-faster-1.html

In general the trend towards equivalence of factors and character
vectors is welcome, though.

# David
On Mon, Oct 5, 2009 at 4:33 PM, hadley wickham <h.wickham at gmail.com> wrote:
--
David M Smith <david at revolution-computing.com>
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)

Check out our upcoming events schedule at www.revolution-computing.com/events