Or better still, extend R via the mechanisms in place. Something akin
to a fast factor package. Any change to R causes downstream issues in
(hundreds of?) millions of lines of deployed code.
It almost seems hard to fathom that a package for this doesn't already
exist. Have you searched CRAN?
Jeff
On Sat, Nov 5, 2011 at 11:30 AM, Milan Bouchet-Valat<nalimilan at club.fr> wrote:
Le vendredi 04 novembre 2011 ? 19:19 -0400, Stavros Macrakis a ?crit :
R factors are the natural way to represent factors -- and should be
efficient since they use small integers. But in fact, for many (but
not all) operations, R factors are considerably slower than integers,
or even character strings. This appears to be because whenever a
factor vector is subsetted, the entire levels vector is copied.
Is it so common for a factor to have so many levels? One can probably
argue that, in that case, using a numeric or character vector is
preferred - factors are no longer the "natural way" of representing this
kind of data.
Adding code to fix a completely theoretical problem is generally not a
good idea. I think you'd have to come up with a real use case to hope
convincing the developers a change is needed. There are probably many
more interesting areas where speedups can be gained than that.
Regards