Skip to content

how to concatenate factor vectors?

14 messages · Bert Gunter, Jorge I Velez, arun +4 more

#
How do I concatenate two vectors of factors?
--8<---------------cut here---------------start------------->8---
int [1:14] 5 4 3 2 1 9 8 7 6 5 ...
Factor w/ 9 levels "1","2","3","4",..: 5 4 3 2 1 9 8 7 6 5 ...
--8<---------------cut here---------------end--------------->8---
so, unlist(list()) works.
is there a better way or is this how this is supposed to be done?
Thanks!
#
No. You need to test more carefully.
[1] 1 2 3 1 2
[1] 1 2 3 1 2
Levels: 1 2 3

## but
[1] 1 3 5 5 7
Levels: 1 3 5 7

However, Is level "5" in 'a' the same as level "5" in 'b' ? The OP
fails to specify, and there's no reason to assume so.  So I would say
clarification is required before any answer can be given.

-- Bert

On Wed, Oct 17, 2012 at 10:43 PM, Jorge I Velez
<jorgeivanvelez at gmail.com> wrote:

  
    
#
Hi,
May be this also works:
?a <- factor(c(1,3,5))
?b <- factor(c(5,7))
f1<-as.numeric(c(as.character(a),as.character(b)))
lev<-as.numeric(c(levels(a),setdiff(levels(b),levels(a))))
f2<-factor(f1,levels=lev)
?f2
#[1] 1 3 5 5 7
#Levels: 1 3 5 7

a1<-factor(5:1,levels=1:9)
?b1<-factor(9:1,levels=1:9)
f<-as.numeric(c(as.character(a1),as.character(b1)))
?lev1<-as.numeric(c(levels(a1),setdiff(levels(b1),levels(a1))))
?f3<-factor(f,levels=lev1)
?f3
# [1] 5 4 3 2 1 9 8 7 6 5 4 3 2 1
#Levels: 1 2 3 4 5 6 7 8 9

A.K.




----- Original Message -----
From: Bert Gunter <gunter.berton at gene.com>
To: Jorge I Velez <jorgeivanvelez at gmail.com>
Cc: r-help at r-project.org; sds at gnu.org
Sent: Thursday, October 18, 2012 2:21 AM
Subject: Re: [R] how to concatenate factor vectors?

No. You need to test more carefully.
[1] 1 2 3 1 2
[1] 1 2 3 1 2
Levels: 1 2 3

## but
[1] 1 3 5 5 7
Levels: 1 3 5 7

However, Is level "5" in 'a' the same as level "5" in 'b' ? The OP
fails to specify, and there's no reason to assume so.? So I would say
clarification is required before any answer can be given.

-- Bert

On Wed, Oct 17, 2012 at 10:43 PM, Jorge I Velez
<jorgeivanvelez at gmail.com> wrote:

  
    
#
yes, of course.
would anyone want to _different_ factors with identical string representations?!
#
mood <- factor(c("blue", "sunny"))
skycolor <- factor(c("azure","blue","teal")

If factors are not defined with levels specifications, automatic merging should never be allowed. The fact that read.table automatically generates factors using default levels is why I nearly always import using as.is=TRUE, perform QC and combining of sources using strings, and only then convert to factor.

If you HAVE defined your factors using explicit levels definitions, you should have no trouble combining them.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
Sam Steingold <sds at gnu.org> wrote:

            
#
hi Jorge,
is sort(unique()) really necessary?
I think
lev <- levels(a)
should be enough.

However, this does not quite do what I want.
I want a function which will _NOT_ have a non-factor vector as an
intermediate value because that would waste a LOT of memory in my case.
I want a function which will check that a and b have identical levels
(in Lisp lingo, the levels are EQ, not just EQUALP).

--8<---------------cut here---------------start------------->8---
[1] e e a b c e j d a b h i a e e g j a c e
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
[1] d d f c j b d e j j g i g j j g g a j a b e d c b i i a b f
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
[1]  5  5  1  2  3  5 10  4  1  2  8  9  1  5  5  7 10  1  3  5  4  4  6  3 10
[26]  2  4  5 10 10  7  9  7 10 10  7  7  1 10  1  2  5  4  3  2  9  9  1  2  6
[1] e e a b c e j d a b h i a e e g j a c e d d f c j b d e j j g i g j j g g a
[39] j a b e d c b i i a b f
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
--8<---------------cut here---------------end--------------->8---

however, this is not a "direct" way (unlike my unlist(list(...))):
there is an intermediate integer vector c(a,b) which is mapped to a
character vector via letters, which is converted back to integers
(==factors).

IIUC, a factor is an integer vector which knows that the integers refer
to levels.

c(a,b) creates such an integer vector.
How do I tell it that it is a factor?
#
would you ever want to concatenate a vector of grades with a vector of
genders?
as I said elsewhere, the function which concatenates factors must check
that the levels are identical before proceeding.
#
http://article.gmane.org/gmane.comp.lang.r.general:277719
#
c() has an unfortunate history.  Originally, c(x) stripped the attributes,
except names but including  dim, dimnames, and class, from x.
Also, c(x,y) stripped the attributes from both x and y and concatenated
them.  Also, c(nameA=1,nameB=2) constructed a vector with a names attribute.

Then c() became a generic function and people wrote methods for certain
classes, typically newer classes without the weight of history on them, that kept
at least the class and would combine 2 or more items of that class.  Adding
a c.factor became tricky because old code used c(factor(...)) to strip the class
and levels attributes to get the integer codes.

You can make a c() that does what you want for your factors by subclassing
factor and writing a c.<yourFactor> that does what you want.  This will not
break old code.  E.g.,
   myFactor <- function(...) {
      tmp <- factor(...)
      class(tmp) <- class("myFactor", class(tmp)) 
      tmp }
   c.myFactor <- function(...) {
      ... compare levels of inputs with identical() and do what you want ...
      ... return something with the right class ...
   }

Or, you can decide to write  a new concatenation function
and stop using c().

As for EQ vs. EQUALP, don't even think of EQ in R: it doesn't make sense there.
identical() is a pretty quick way to check that two objects have identical contents.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
:-)
ISTR reading in the R manual ~15(?) years ago that the language was in a
flux and one could not expect code written for the current release to
work in the next release.  I was considering R as the graphing back end
at that time, so this note turned me off.
Now it turns out that R has a legacy it cannot shake. :-)
Good! That's what I was looking for!

concatenate.factors <- function (x, y) {
  stopifnot(identical(levels(x),levels(y)))
  unlist(list(x,y), use.names=FALSE)
}

This seems to do what I need.

I see that
identical(levels(concatenate.factors(a,b)),levels(a))
==> TRUE
DIUC that concatenate.factors does NOT create an intermediate vector and
then re-factor it?

Thank you very much for your insight!
#
On Thu, Oct 18, 2012 at 4:20 PM, Sam Steingold <sds at gnu.org> wrote:
You said nothing about having good reasons to do manipulation s
thereupon. As I read the question (changing "to" to "two"), you were
just asking if two distinct factors might share string representations
;-)

Perhaps even better: grades on two different exams: sharing the same
level set entirely, but from a statistical point of view, distinct
(but likely confounded)

Cheers,
M