how to concatenate factor vectors?

How do I concatenate two vectors of factors?
--8<---------------cut here---------------start------------->8---
a <- factor(5:1,levels=1:9)
b <- factor(9:1,levels=1:9)
str(c(a,b))
int [1:14] 5 4 3 2 1 9 8 7 6 5 ...
str(unlist(list(a,b),use.names=FALSE))
Factor w/ 9 levels "1","2","3","4",..: 5 4 3 2 1 9 8 7 6 5 ...
--8<---------------cut here---------------end--------------->8---
so, unlist(list()) works.
is there a better way or is this how this is supposed to be done?
Thanks!
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://honestreporting.com
http://think-israel.org http://thereligionofpeace.com http://mideasttruth.com
(lisp programmers do it better)
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121018/b42604d7/attachment.pl>
No. You need to test more carefully.
a <- factor(c(1,3,5))
b <- factor(c(5,7))
c(a,b)
[1] 1 2 3 1 2
lev <- sort(unique(f <- c(a,b)))
f <- factor(f,levels=lev)
f
[1] 1 2 3 1 2
Levels: 1 2 3

## but
unlist(list(a,b),use.names=FALSE)
[1] 1 3 5 5 7
Levels: 1 3 5 7

However, Is level "5" in 'a' the same as level "5" in 'b' ? The OP
fails to specify, and there's no reason to assume so.  So I would say
clarification is required before any answer can be given.

-- Bert

On Wed, Oct 17, 2012 at 10:43 PM, Jorge I Velez
Hi Sam,

Perhaps the following?

a <- factor(5:1,levels=1:9)
b <- factor(9:1,levels=1:9)
lev <- sort(unique(f <- c(a, b)))
f <- factor(f, levels = lev)
str(f)
 Factor w/ 9 levels "1","2","3","4",..: 5 4 3 2 1 9 8 7 6 5 ...

HTH,
Jorge.-

On Thu, Oct 18, 2012 at 3:44 PM, Sam Steingold <> wrote:

How do I concatenate two vectors of factors?
--8<---------------cut here---------------start------------->8---
a <- factor(5:1,levels=1:9)
b <- factor(9:1,levels=1:9)
str(c(a,b))
 int [1:14] 5 4 3 2 1 9 8 7 6 5 ...
str(unlist(list(a,b),use.names=FALSE))
 Factor w/ 9 levels "1","2","3","4",..: 5 4 3 2 1 9 8 7 6 5 ...
--8<---------------cut here---------------end--------------->8---
so, unlist(list()) works.
is there a better way or is this how this is supposed to be done?
Thanks!
--
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X
11.0.11103000
http://www.childpsy.net/ http://honestreporting.com
http://think-israel.org http://thereligionofpeace.com
http://mideasttruth.com
(lisp programmers do it better)

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121018/6955d766/attachment.pl>
Hi,
May be this also works:
?a <- factor(c(1,3,5))
?b <- factor(c(5,7))
f1<-as.numeric(c(as.character(a),as.character(b)))
lev<-as.numeric(c(levels(a),setdiff(levels(b),levels(a))))
f2<-factor(f1,levels=lev)
?f2
#[1] 1 3 5 5 7
#Levels: 1 3 5 7

a1<-factor(5:1,levels=1:9)
?b1<-factor(9:1,levels=1:9)
f<-as.numeric(c(as.character(a1),as.character(b1)))
?lev1<-as.numeric(c(levels(a1),setdiff(levels(b1),levels(a1))))
?f3<-factor(f,levels=lev1)
?f3
# [1] 5 4 3 2 1 9 8 7 6 5 4 3 2 1
#Levels: 1 2 3 4 5 6 7 8 9

A.K.

----- Original Message -----
From: Bert Gunter <gunter.berton at gene.com>
To: Jorge I Velez <jorgeivanvelez at gmail.com>
Cc: r-help at r-project.org; sds at gnu.org
Sent: Thursday, October 18, 2012 2:21 AM
Subject: Re: [R] how to concatenate factor vectors?

No. You need to test more carefully.
a <- factor(c(1,3,5))
b <- factor(c(5,7))
c(a,b)
[1] 1 2 3 1 2
lev <- sort(unique(f <- c(a,b)))
f <- factor(f,levels=lev)
f
[1] 1 2 3 1 2
Levels: 1 2 3

## but
unlist(list(a,b),use.names=FALSE)
[1] 1 3 5 5 7
Levels: 1 3 5 7

However, Is level "5" in 'a' the same as level "5" in 'b' ? The OP
fails to specify, and there's no reason to assume so.? So I would say
clarification is required before any answer can be given.

-- Bert

On Wed, Oct 17, 2012 at 10:43 PM, Jorge I Velez
Hi Sam,

Perhaps the following?

a <- factor(5:1,levels=1:9)
b <- factor(9:1,levels=1:9)
lev <- sort(unique(f <- c(a, b)))
f <- factor(f, levels = lev)
str(f)
? Factor w/ 9 levels "1","2","3","4",..: 5 4 3 2 1 9 8 7 6 5 ...

HTH,
Jorge.-

On Thu, Oct 18, 2012 at 3:44 PM, Sam Steingold <> wrote:

How do I concatenate two vectors of factors?
--8<---------------cut here---------------start------------->8---
a <- factor(5:1,levels=1:9)
b <- factor(9:1,levels=1:9)
str(c(a,b))
? int [1:14] 5 4 3 2 1 9 8 7 6 5 ...
str(unlist(list(a,b),use.names=FALSE))
? Factor w/ 9 levels "1","2","3","4",..: 5 4 3 2 1 9 8 7 6 5 ...
--8<---------------cut here---------------end--------------->8---
so, unlist(list()) works.
is there a better way or is this how this is supposed to be done?
Thanks!
--
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X
11.0.11103000
http://www.childpsy.net/ http://honestreporting.com
http://think-israel.org http://thereligionofpeace.com
http://mideasttruth.com
(lisp programmers do it better)

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

? ? ? ?  [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
* Bert Gunter <thagre.oregba at trar.pbz> [2012-10-17 23:21:44 -0700]:

However, Is level "5" in 'a' the same as level "5" in 'b' ?
yes, of course.
would anyone want to _different_ factors with identical string representations?!
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://pmw.org.il http://americancensorship.org
http://memri.org http://think-israel.org http://camera.org
Lisp is a language for doing what you've been told is impossible. - Kent Pitman
mood <- factor(c("blue", "sunny"))
skycolor <- factor(c("azure","blue","teal")

If factors are not defined with levels specifications, automatic merging should never be allowed. The fact that read.table automatically generates factors using default levels is why I nearly always import using as.is=TRUE, perform QC and combining of sources using strings, and only then convert to factor.

If you HAVE defined your factors using explicit levels definitions, you should have no trouble combining them.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

* Bert Gunter <thagre.oregba at trar.pbz> [2012-10-17 23:21:44 -0700]:

However, Is level "5" in 'a' the same as level "5" in 'b' ?
yes, of course.
would anyone want to _different_ factors with identical string
representations?!
hi Jorge,
* Jorge I Velez <wbetrvinairyrm at tznvy.pbz> [2012-10-18 16:43:58 +1100]:

a <- factor(5:1,levels=1:9)
b <- factor(9:1,levels=1:9)
lev <- sort(unique(f <- c(a, b)))
f <- factor(f, levels = lev)
str(f)
 Factor w/ 9 levels "1","2","3","4",..: 5 4 3 2 1 9 8 7 6 5 ...
is sort(unique()) really necessary?
I think
lev <- levels(a)
should be enough.

However, this does not quite do what I want.
I want a function which will _NOT_ have a non-factor vector as an
intermediate value because that would waste a LOT of memory in my case.
I want a function which will check that a and b have identical levels
(in Lisp lingo, the levels are EQ, not just EQUALP).

--8<---------------cut here---------------start------------->8---
a <- factor(letters[sample(1:10,20,replace=TRUE)],levels=letters)
[1] e e a b c e j d a b h i a e e g j a c e
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
b <- factor(letters[sample(1:10,30,replace=TRUE)],levels=letters)
[1] d d f c j b d e j j g i g j j g g a j a b e d c b i i a b f
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
c(a,b)
[1]  5  5  1  2  3  5 10  4  1  2  8  9  1  5  5  7 10  1  3  5  4  4  6  3 10
[26]  2  4  5 10 10  7  9  7 10 10  7  7  1 10  1  2  5  4  3  2  9  9  1  2  6
factor(letters[c(a,b)],levels=letters)
[1] e e a b c e j d a b h i a e e g j a c e d d f c j b d e j j g i g j j g g a
[39] j a b e d c b i i a b f
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
--8<---------------cut here---------------end--------------->8---

however, this is not a "direct" way (unlike my unlist(list(...))):
there is an intermediate integer vector c(a,b) which is mapped to a
character vector via letters, which is converted back to integers
(==factors).

IIUC, a factor is an integer vector which knows that the integers refer
to levels.

c(a,b) creates such an integer vector.
How do I tell it that it is a factor?
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://palestinefacts.org http://www.memritv.org
http://www.PetitionOnline.com/tap12009/ http://dhimmi.com
usually: can't pay ==> don't buy. software: can't buy ==> don't pay
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121018/08f73a40/attachment.pl>
* R. Michael Weylandt <zvpunry.jrlynaqg at tznvy.pbz> [2012-10-18 16:01:37 +0100]:

On Thursday, October 18, 2012, Sam Steingold wrote:

* Bert Gunter <thagre.oregba at trar.pbz> [2012-10-17 23:21:44 -0700]:

However, Is level "5" in 'a' the same as level "5" in 'b' ?
yes, of course.
would anyone want to _different_ factors with identical string
representations?!
Off the cuff, studying education and grades: F could be a grade or a
gender.
would you ever want to concatenate a vector of grades with a vector of
genders?
as I said elsewhere, the function which concatenates factors must check
that the levels are identical before proceeding.
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://pmw.org.il http://camera.org
http://openvotingconsortium.org http://truepeace.org http://jihadwatch.org
Ernqvat guvf ivbyngrf QZPN.
* Jeff Newmiller <wqarjzvy at qpa.qnivf.pn.hf> [2012-10-18 07:53:24 -0700]:

If you HAVE defined your factors using explicit levels definitions, you
should have no trouble combining them.
http://article.gmane.org/gmane.comp.lang.r.general:277719
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://memri.org http://iris.org.il http://pmw.org.il
http://think-israel.org http://honestreporting.com http://www.memritv.org
A person without flaws probably lacks strengths either.
c() has an unfortunate history.  Originally, c(x) stripped the attributes,
except names but including  dim, dimnames, and class, from x.
Also, c(x,y) stripped the attributes from both x and y and concatenated
them.  Also, c(nameA=1,nameB=2) constructed a vector with a names attribute.

Then c() became a generic function and people wrote methods for certain
classes, typically newer classes without the weight of history on them, that kept
at least the class and would combine 2 or more items of that class.  Adding
a c.factor became tricky because old code used c(factor(...)) to strip the class
and levels attributes to get the integer codes.

You can make a c() that does what you want for your factors by subclassing
factor and writing a c.<yourFactor> that does what you want.  This will not
break old code.  E.g.,
   myFactor <- function(...) {
      tmp <- factor(...)
      class(tmp) <- class("myFactor", class(tmp)) 
      tmp }
   c.myFactor <- function(...) {
      ... compare levels of inputs with identical() and do what you want ...
      ... return something with the right class ...
   }

Or, you can decide to write  a new concatenation function
and stop using c().

As for EQ vs. EQUALP, don't even think of EQ in R: it doesn't make sense there.
identical() is a pretty quick way to check that two objects have identical contents.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of Sam Steingold
Sent: Thursday, October 18, 2012 8:02 AM
To: r-help at r-project.org; Jorge I Velez
Subject: Re: [R] how to concatenate factor vectors?

hi Jorge,

* Jorge I Velez <wbetrvinairyrm at tznvy.pbz> [2012-10-18 16:43:58 +1100]:

a <- factor(5:1,levels=1:9)
b <- factor(9:1,levels=1:9)
lev <- sort(unique(f <- c(a, b)))
f <- factor(f, levels = lev)
str(f)
 Factor w/ 9 levels "1","2","3","4",..: 5 4 3 2 1 9 8 7 6 5 ...
is sort(unique()) really necessary?
I think
lev <- levels(a)
should be enough.

However, this does not quite do what I want.
I want a function which will _NOT_ have a non-factor vector as an
intermediate value because that would waste a LOT of memory in my case.
I want a function which will check that a and b have identical levels
(in Lisp lingo, the levels are EQ, not just EQUALP).

--8<---------------cut here---------------start------------->8---
a <- factor(letters[sample(1:10,20,replace=TRUE)],levels=letters)
 [1] e e a b c e j d a b h i a e e g j a c e
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
b <- factor(letters[sample(1:10,30,replace=TRUE)],levels=letters)
 [1] d d f c j b d e j j g i g j j g g a j a b e d c b i i a b f
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
c(a,b)
 [1]  5  5  1  2  3  5 10  4  1  2  8  9  1  5  5  7 10  1  3  5  4  4  6  3 10
[26]  2  4  5 10 10  7  9  7 10 10  7  7  1 10  1  2  5  4  3  2  9  9  1  2  6
factor(letters[c(a,b)],levels=letters)
 [1] e e a b c e j d a b h i a e e g j a c e d d f c j b d e j j g i g j j g g a
[39] j a b e d c b i i a b f
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
--8<---------------cut here---------------end--------------->8---

however, this is not a "direct" way (unlike my unlist(list(...))):
there is an intermediate integer vector c(a,b) which is mapped to a
character vector via letters, which is converted back to integers
(==factors).

IIUC, a factor is an integer vector which knows that the integers refer
to levels.

c(a,b) creates such an integer vector.
How do I tell it that it is a factor?

--
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://palestinefacts.org http://www.memritv.org
http://www.PetitionOnline.com/tap12009/ http://dhimmi.com
usually: can't pay ==> don't buy. software: can't buy ==> don't pay

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
* William Dunlap <jqhaync at gvopb.pbz> [2012-10-18 15:33:38 +0000]:

c() has an unfortunate history.
:-)
ISTR reading in the R manual ~15(?) years ago that the language was in a
flux and one could not expect code written for the current release to
work in the next release.  I was considering R as the graphing back end
at that time, so this note turned me off.
Now it turns out that R has a legacy it cannot shake. :-)
Or, you can decide to write  a new concatenation function
and stop using c().

As for EQ vs. EQUALP, don't even think of EQ in R: it doesn't make
sense there.  identical() is a pretty quick way to check that two
objects have identical contents.
Good! That's what I was looking for!

concatenate.factors <- function (x, y) {
  stopifnot(identical(levels(x),levels(y)))
  unlist(list(x,y), use.names=FALSE)
}

This seems to do what I need.

I see that
identical(levels(concatenate.factors(a,b)),levels(a))
==> TRUE
DIUC that concatenate.factors does NOT create an intermediate vector and
then re-factor it?

Thank you very much for your insight!
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://jihadwatch.org http://openvotingconsortium.org
http://www.memritv.org http://memri.org http://truepeace.org
Live Lisp and prosper.
* R. Michael Weylandt <zvpunry.jrlynaqg at tznvy.pbz> [2012-10-18 16:01:37 +0100]:

On Thursday, October 18, 2012, Sam Steingold wrote:

* Bert Gunter <thagre.oregba at trar.pbz> [2012-10-17 23:21:44 -0700]:

However, Is level "5" in 'a' the same as level "5" in 'b' ?
yes, of course.
would anyone want to _different_ factors with identical string
representations?!
Off the cuff, studying education and grades: F could be a grade or a
gender.
would you ever want to concatenate a vector of grades with a vector of
genders?
You said nothing about having good reasons to do manipulation s
thereupon. As I read the question (changing "to" to "two"), you were
just asking if two distinct factors might share string representations
;-)

Perhaps even better: grades on two different exams: sharing the same
level set entirely, but from a statistical point of view, distinct
(but likely confounded)

Cheers,
M