Skip to content
Prev 38950 / 63421 Next

match function causing bad performance when using tablefunction on factors with multibyte characters on Windows

Karl Ove Hufthammer wrote:

            
Some additional notes: ?table? uses ?factor? directly, but also indirectly, 
in ?addNA?. The definition of ?addNA? ends with:

    if (!any(is.na(ll))) 
        ll <- c(ll, NA)
    factor(x, levels = ll, exclude = NULL)

Which is slow for non-ASCII levels. One *could* fix this by changing the 
last line to

  attr(x, "levels")=ll

But one soon ends up changing every function that uses ?factor? in this way, 
which seems like the wrong approach. The problems lies inside ?factor?,
and that?s where it should be fixed, if feasible.

BTW, the defintion of ?addNA? looks suboptimal in a different way. The last 
line is always executed, even if the factor *does* contain NA values (and of 
course NA levels). For this case, basically it?s doing nothing, just taking 
a very long time doing it (at least on Windows). Moving the last line inside 
the ?if? clause, and adding a ?else return(x)? would fix this (correct me if 
I?m wrong).