Opinion: Why I find factors convenient to use
On Fri, Aug 17, 2012 at 07:34:35PM +0100, Rui Barradas wrote:
Hello, No, factors may use less memory. System dependent?
x <-sample(c("small","medium","large"),1e4,rep=TRUE)
y <- factor(x)
object.size(x)
80184 bytes
object.size(y)
40576 bytes
sessionInfo()
R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252 [3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C [5] LC_TIME=Portuguese_Portugal.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rcapture_1.2-0 xts_0.8-0 zoo_1.7-7 loaded via a namespace (and not attached): [1] chron_2.3-39 fortunes_1.4-2 grid_2.15.1 lattice_0.20-6 tools_2.15.1 And I agree with what Steve said, stringsAsFactors = FALSE saves hours of debuging time.
Hi. I use stringsAsFactors = FALSE quite frequently. If there is a discussion on R-devel, whether this should be the default, i would support this. Factors are very useful and sometimes necessary, but they are hard to manipulate. As Jeff Newmiller said, it is a good strategy to prepare the data as character type and convert to a factor, when they are complete. The users should know, how to use factors, however the strategy "convert to a factor eventually" is more consistent with not having stringsAsFactors = TRUE as the default. Petr Savicky.