Skip to content
Prev 22978 / 63421 Next

c.factor

I noticed that a new feature in R 2.4 is that unlist of a list of factors 
already does the operation that I proposed :
[1] a b c d e d e f g h
Levels: a b c d e f g h
Therefore, does it not make sense that c(x,y) should return the same as 
unlist(list(x,y))  ?

Also, the specific "if" for factors inside the definition of unlist, not 
surprisingly, uses a very similar method to those previously posted. 
However, it first coerces the factors with as.character, before matching to 
the new level set.  This is inefficient. Here is the c.factor method again 
that I proposed, which avoids the as.character and is therefore more 
efficient.  Leaving aside the discussion about c.factor, or concat, or 
whatever,  could 'unlist' be changed to use this method instead ?   After 
all one of the key advantages of factors is to save main memory,  anything 
which coerces back to character is going to defeat the benefit.
args <- list(...)
    if (!all(sapply(args, is.factor))) stop("all arguments must be factor")
    newlevels = unique(unlist(lapply(args,levels)))
    ans = unlist(lapply(args, function(x) {
        m = match(levels(x), newlevels)
        m[as.integer(x)]
    }))
    levels(ans) = newlevels
    class(ans) = "factor"
    ans
}
[1] TRUE
_
platform       i386-pc-mingw32
arch           i386
os             mingw32
system         i386, mingw32
status
major          2
minor          4.0
year           2006
month          10
day            03
svn rev        39566
language       R
version.string R version 2.4.0 (2006-10-03)
"Brian Ripley" <ripley at stats.ox.ac.uk> wrote in message 
news:Pine.LNX.4.64.0611150926070.19618 at auk.stats...