Error occurred during mean calculation of a column of a data frame, which is apparently contents numeric data
On 29/02/2012 13:41, Duncan Murdoch wrote:
On 12-02-29 8:16 AM, R. Michael Weylandt wrote:
Factors are internally stored as integers (enums if you have used other programming languages) with a special label set -- it's more memory efficient than storing the whole string over and over.
That was one of the original justifications, but character vectors are just as memory efficient these days.
No, not really. Character vectors (STRSXPs) store a pointer for each string entry, and factors store an integer. On most current systems pointers are twice the size of integers, so on a 64-bit system > a <- rep(letters[1:10], each = 1000) > object.size(a) 80520 bytes > object.size(as.factor(a)) 41008 bytes
The other justifications are still valid: sometimes you have a vector
which only takes on a subset of the possible values it could take, and
when you tabulate it, you'd like to see those zero counts. You may also
want to control the display order, and a factor allows that.
For example:
x <- c("a", "a", "b")
table(x)
x <- factor(x, levels=c("c", "b", "a"))
table(x)
Duncan Murdoch
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595