Skip to content
Prev 62917 / 63424 Next

table() and as.character() performance for logical values

Am 21.03.25 um 15:42 schrieb Aidan Lakshman via R-devel:
Yes, I also think 'factor' could do a bit better for unclassed integers 
(such as when called from 'cut') as well as for logical input (such as 
from 'summary' -> 'table').

Note that 'as.factor' already has a "fast track" for plain integers 
(originally for 'split.default' from 'tapply'), so can be used instead 
of 'factor' when there is no need for custom 'levels', 'labels', or 
'exclude'. (Thanks for already mentioning 'tabulate'.)

A 'factor' patch would apply more broadly, e.g.:

===================================================================
--- src/library/base/R/factor.R	(Revision 88042)
+++ src/library/base/R/factor.R	(Arbeitskopie)
@@ -20,14 +20,18 @@
                     exclude = NA, ordered = is.ordered(x), nmax = NA)
  {
      if(is.null(x)) x <- character()
+    directmatch <- !is.object(x) &&
+        (is.character(x) || is.integer(x) || is.logical(x))
      nx <- names(x)
      if (missing(levels)) {
  	y <- unique(x, nmax = nmax)
  	ind <- order(y)
-	levels <- unique(as.character(y)[ind])
+        if (!directmatch)
+            y <- as.character(y)
+	levels <- unique(y[ind])
      }
      force(ordered) # check if original x is an ordered factor
-    if(!is.character(x))
+    if(!directmatch)
  	x <- as.character(x)
      ## levels could be a long vector, but match will not handle that.
      levels <- levels[is.na(match(levels, exclude))]
      f <- match(x, levels)
===================================================================

This skips as.character() also for integer/logical 'x' and would indeed 
bring table() runtimes "in order":

     set.seed(1)
     C <- sample(c("no", "yes"), 10^7, replace = TRUE)
     F <- as.factor(C)
     L <- F == "yes"
     I <- as.integer(L)
     N <- as.numeric(I)

     ## Median system.time(table(.)) in ms:
     ## table(F)   256
     ## table(I)   384   # not  696
     ## table(L)   409   # not 1159
     ## table(C)   591
     ## table(N)  3324

The (seemingly) small patch passes check-all, but maybe it overlooks 
some edge cases. I'd test it on a subset of CRAN/BIOC packages.

Best,

	Sebastian Meyer