correction to the previously asked question (about merging factors)

I have two factors l1, l2, and I'd like to merge them.

(Remark:       The factors can not be converted to charaters)

Function c() does not give me the result I want:
l1 = factor(c('aaaa', 'bbbb'))
l2 = factor(c('ccc', 'dd'))
lMerge = factor(c(l1, l2))
lMerge
[1] 1 2 1 2
Levels: 1 2

I'd like to merge l1 and l2 and to get lMerge 
----------------------------------------------

[1] aaaa bbbb ccc dd
Levels: aaaa bbbb ccc dd

instead of 
----------

[1] 1 2 1 2
Levels: 1 2

How should I do that without converting the factors back to strings.
-------------------------------------------------------------------

-- 
Svetlana Eden        Biostatistician II            School of Medicine
                     Department of Biostatistics   Vanderbilt University
Svetlana Eden        Biostatistician II            School of Medicine
                     Department of Biostatistics   Vanderbilt University
What about the following: 

 > F1 <- factor(c("b", "a"))
 > F2 <- factor(c("c", "b"))
 > k1 <- length(F1)
 > k2 <- length(F2)
 > F12.lvls <- unique(c(levels(F1), levels(F2)))
 > F. <- factor(rep(F12.lvls[1], k1+k1), levels=F12.lvls)
 > F.[1:k1] <- F1
 > F.[-(1:k1)] <- F2
 > F.
[1] b a c b
Levels: a b c

      This saves converting the factors to characters, which might save 
computer time at the expense of your time. 
      hope this helps. 
      spencer graves

I have two factors l1, l2, and I'd like to merge them.

(Remark:       The factors can not be converted to charaters)

Function c() does not give me the result I want:

l1 = factor(c('aaaa', 'bbbb'))
l2 = factor(c('ccc', 'dd'))
lMerge = factor(c(l1, l2))
lMerge

[1] 1 2 1 2
Levels: 1 2

I'd like to merge l1 and l2 and to get lMerge 
----------------------------------------------

[1] aaaa bbbb ccc dd
Levels: aaaa bbbb ccc dd

instead of 
----------

[1] 1 2 1 2
Levels: 1 2

How should I do that without converting the factors back to strings.
-------------------------------------------------------------------

How about simply

F1 <- factor(c("b", "a"))
F2 <- factor(c("c", "b"))
F3 <- factor(c(levels(F1)[F1], levels(F2)[F2]))

-sundar

     What about the following:
 > F1 <- factor(c("b", "a"))
 > F2 <- factor(c("c", "b"))
 > k1 <- length(F1)
 > k2 <- length(F2)
 > F12.lvls <- unique(c(levels(F1), levels(F2)))
 > F. <- factor(rep(F12.lvls[1], k1+k1), levels=F12.lvls)
 > F.[1:k1] <- F1
 > F.[-(1:k1)] <- F2
 > F.
[1] b a c b
Levels: a b c

     This saves converting the factors to characters, which might save 
computer time at the expense of your time.      hope this helps.      
spencer graves

Svetlana Eden wrote:

I have two factors l1, l2, and I'd like to merge them.

(Remark:       The factors can not be converted to charaters)

Function c() does not give me the result I want:

l1 = factor(c('aaaa', 'bbbb'))
l2 = factor(c('ccc', 'dd'))
lMerge = factor(c(l1, l2))
lMerge

[1] 1 2 1 2
Levels: 1 2

I'd like to merge l1 and l2 and to get lMerge 
----------------------------------------------

[1] aaaa bbbb ccc dd
Levels: aaaa bbbb ccc dd

instead of ----------

[1] 1 2 1 2
Levels: 1 2

How should I do that without converting the factors back to strings.
-------------------------------------------------------------------

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
Sundar:  Your solution is not only more elegant than mine, it's 
also faster, at least with this tiny example: 

 > start.time <- proc.time()
 > k1 <- length(F1)
 > k2 <- length(F2)
 > F12.lvls <- unique(c(levels(F1), levels(F2)))
 > F. <- factor(rep(F12.lvls[1], k1+k1), levels=F12.lvls)
 > F.[1:k1] <- F1
 > F.[-(1:k1)] <- F2
 > proc.time()-start.time
[1] 0.00 0.00 0.42   NA   NA
 >
 > start.time <- proc.time()
 > F1 <- factor(c("b", "a"))
 > F2 <- factor(c("c", "b"))
 > F3 <- factor(c(levels(F1)[F1], levels(F2)[F2]))
 > proc.time()-start.time
[1] 0.00 0.00 0.24   NA   NA
 >
      With longer vectors, mine may be faster -- but yours is still more 
elegant. 

      Best Wishes,
      spencer graves

How about simply

F1 <- factor(c("b", "a"))
F2 <- factor(c("c", "b"))
F3 <- factor(c(levels(F1)[F1], levels(F2)[F2]))

-sundar

Spencer Graves wrote:

     What about the following:
 > F1 <- factor(c("b", "a"))
 > F2 <- factor(c("c", "b"))
 > k1 <- length(F1)
 > k2 <- length(F2)
 > F12.lvls <- unique(c(levels(F1), levels(F2)))
 > F. <- factor(rep(F12.lvls[1], k1+k1), levels=F12.lvls)
 > F.[1:k1] <- F1
 > F.[-(1:k1)] <- F2
 > F.
[1] b a c b
Levels: a b c

     This saves converting the factors to characters, which might 
save computer time at the expense of your time.      hope this 
helps.      spencer graves

Svetlana Eden wrote:

I have two factors l1, l2, and I'd like to merge them.

(Remark:       The factors can not be converted to charaters)

Function c() does not give me the result I want:

l1 = factor(c('aaaa', 'bbbb'))
l2 = factor(c('ccc', 'dd'))
lMerge = factor(c(l1, l2))
lMerge

[1] 1 2 1 2
Levels: 1 2

I'd like to merge l1 and l2 and to get lMerge 
----------------------------------------------

[1] aaaa bbbb ccc dd
Levels: aaaa bbbb ccc dd

instead of ----------

[1] 1 2 1 2
Levels: 1 2

How should I do that without converting the factors back to strings.
-------------------------------------------------------------------

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

Spencer Graves <spencer.graves at pdf.com> writes:
      Sundar:  Your solution is not only more elegant than mine, it's
also faster, at least with this tiny example: > start.time <-
proc.time()
 > k1 <- length(F1)
 > k2 <- length(F2)
 > F12.lvls <- unique(c(levels(F1), levels(F2)))
 > F. <- factor(rep(F12.lvls[1], k1+k1), levels=F12.lvls)
 > F.[1:k1] <- F1
 > F.[-(1:k1)] <- F2
 > proc.time()-start.time
[1] 0.00 0.00 0.42   NA   NA
 >
 > start.time <- proc.time()
 > F1 <- factor(c("b", "a"))
 > F2 <- factor(c("c", "b"))
 > F3 <- factor(c(levels(F1)[F1], levels(F2)[F2]))
 > proc.time()-start.time
[1] 0.00 0.00 0.24   NA   NA
 >
      With longer vectors, mine may be faster -- but yours is still
more elegant.     Best Wishes,
      spencer graves
Actually, Sundars solution is exactly equivalent to the 

factor(c(as.character(F1),as.character(F2)))

that several have suggested, and which may actually be good enough for
the vast majority of cases. It is in fact the same thing that goes on
inside rbind.data.frame (that uses as.vector, which is equivalent).

If you really want something optimal, in the sense of not allocating a
large amount of character strings and comparing them individually to
a joint level set, I think you need something like this:

l1 <- levels(F1)
l2 <- levels(F2)
ll <- sort(unique(c(l1, l2)))
m1 <- match(l1, ll)
m2 <- match(l2, ll)
factor(c(m1[F1], m2[F2]), labels=ll)

or if you want to be really hardcore, bypass the inefficiencies inside
factor() and do

structure(c(m1[F1], m2[F2]), levels=ll, class="factor")

(People have been known to regret coding with explicit calls to
structure(), though...)
O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
Thanks, Peter. 

      So Sundar's more elegant solution is equivalent to my initial 
response to this question -- which shows how much one can lose trying to 
be too clever. 

      Best Wishes,
      spencer graves

Spencer Graves <spencer.graves at pdf.com> writes:

     Sundar:  Your solution is not only more elegant than mine, it's
also faster, at least with this tiny example: > start.time <-
proc.time()
k1 <- length(F1)
k2 <- length(F2)
F12.lvls <- unique(c(levels(F1), levels(F2)))
F. <- factor(rep(F12.lvls[1], k1+k1), levels=F12.lvls)
F.[1:k1] <- F1
F.[-(1:k1)] <- F2
proc.time()-start.time
[1] 0.00 0.00 0.42   NA   NA
start.time <- proc.time()
F1 <- factor(c("b", "a"))
F2 <- factor(c("c", "b"))
F3 <- factor(c(levels(F1)[F1], levels(F2)[F2]))
proc.time()-start.time
[1] 0.00 0.00 0.24   NA   NA

     With longer vectors, mine may be faster -- but yours is still
more elegant.     Best Wishes,
     spencer graves

Actually, Sundars solution is exactly equivalent to the 

factor(c(as.character(F1),as.character(F2)))

that several have suggested, and which may actually be good enough for
the vast majority of cases. It is in fact the same thing that goes on
inside rbind.data.frame (that uses as.vector, which is equivalent).

If you really want something optimal, in the sense of not allocating a
large amount of character strings and comparing them individually to
a joint level set, I think you need something like this:

l1 <- levels(F1)
l2 <- levels(F2)
ll <- sort(unique(c(l1, l2)))
m1 <- match(l1, ll)
m2 <- match(l2, ll)
factor(c(m1[F1], m2[F2]), labels=ll)

or if you want to be really hardcore, bypass the inefficiencies inside
factor() and do

structure(c(m1[F1], m2[F2]), levels=ll, class="factor")

(People have been known to regret coding with explicit calls to
structure(), though...)