correction to the previously asked question (about merging factors)

Peter Dalgaard · 2004-02-05T22:42:08Z

Spencer Graves writes: > Sundar: Your solution is not only more elegant than mine, it's > also faster, at least with this tiny example: > start.time proc.time() > > k1 > k2 > F12.lvls > F. > F.[1:k1] > F.[-(1:k1)] > proc.time()-start.time > [1] 0.00 0.00 0.42 NA NA > > > > start.time >

Peter Dalgaard

Thu, Feb 5, 2004 2:42 PM

Spencer Graves <spencer.graves at pdf.com> writes:

Actually, Sundars solution is exactly equivalent to the 

factor(c(as.character(F1),as.character(F2)))

that several have suggested, and which may actually be good enough for
the vast majority of cases. It is in fact the same thing that goes on
inside rbind.data.frame (that uses as.vector, which is equivalent).

If you really want something optimal, in the sense of not allocating a
large amount of character strings and comparing them individually to
a joint level set, I think you need something like this:

l1 <- levels(F1)
l2 <- levels(F2)
ll <- sort(unique(c(l1, l2)))
m1 <- match(l1, ll)
m2 <- match(l2, ll)
factor(c(m1[F1], m2[F2]), labels=ll)

or if you want to be really hardcore, bypass the inefficiencies inside
factor() and do

structure(c(m1[F1], m2[F2]), levels=ll, class="factor")

(People have been known to regret coding with explicit calls to
structure(), though...)

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

correction to the previously asked question (about merging factors)

Thread (6 messages)