Efficiency of factor objects
Le vendredi 04 novembre 2011 ? 19:19 -0400, Stavros Macrakis a ?crit :
R factors are the natural way to represent factors -- and should be efficient since they use small integers. But in fact, for many (but not all) operations, R factors are considerably slower than integers, or even character strings. This appears to be because whenever a factor vector is subsetted, the entire levels vector is copied.
Is it so common for a factor to have so many levels? One can probably argue that, in that case, using a numeric or character vector is preferred - factors are no longer the "natural way" of representing this kind of data. Adding code to fix a completely theoretical problem is generally not a good idea. I think you'd have to come up with a real use case to hope convincing the developers a change is needed. There are probably many more interesting areas where speedups can be gained than that. Regards