correction to the previously asked question (about merging factors)
Spencer Graves <spencer.graves at pdf.com> writes:
Sundar: Your solution is not only more elegant than mine, it's also faster, at least with this tiny example: > start.time <- proc.time()
> k1 <- length(F1) > k2 <- length(F2) > F12.lvls <- unique(c(levels(F1), levels(F2))) > F. <- factor(rep(F12.lvls[1], k1+k1), levels=F12.lvls) > F.[1:k1] <- F1 > F.[-(1:k1)] <- F2 > proc.time()-start.time
[1] 0.00 0.00 0.42 NA NA
>
> start.time <- proc.time()
> F1 <- factor(c("b", "a"))
> F2 <- factor(c("c", "b"))
> F3 <- factor(c(levels(F1)[F1], levels(F2)[F2]))
> proc.time()-start.time
[1] 0.00 0.00 0.24 NA NA
>
With longer vectors, mine may be faster -- but yours is still
more elegant. Best Wishes,
spencer graves
Actually, Sundars solution is exactly equivalent to the factor(c(as.character(F1),as.character(F2))) that several have suggested, and which may actually be good enough for the vast majority of cases. It is in fact the same thing that goes on inside rbind.data.frame (that uses as.vector, which is equivalent). If you really want something optimal, in the sense of not allocating a large amount of character strings and comparing them individually to a joint level set, I think you need something like this: l1 <- levels(F1) l2 <- levels(F2) ll <- sort(unique(c(l1, l2))) m1 <- match(l1, ll) m2 <- match(l2, ll) factor(c(m1[F1], m2[F2]), labels=ll) or if you want to be really hardcore, bypass the inefficiencies inside factor() and do structure(c(m1[F1], m2[F2]), levels=ll, class="factor") (People have been known to regret coding with explicit calls to structure(), though...)
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907