Skip to content

Building factors across two columns, is this possible?

5 messages · Brian Feeny, Rui Barradas, David Winsemius

#
I am trying to make it so two columns with similar data use the same internal numbers for same factors, here is the example:
V1    V2       V3
1   sun  moon    stars
2 stars  moon      sun
3   cat   dog   catdog
4   dog  moon      sun
5  bird plane superman
6  1000   dog     2000
'data.frame':	6 obs. of  3 variables:
 $ V1: Factor w/ 6 levels "1000","bird",..: 6 5 3 4 2 1
 $ V2: Factor w/ 3 levels "dog","moon","plane": 2 2 1 2 3 1
 $ V3: Factor w/ 5 levels "2000","catdog",..: 3 4 2 4 5 1
[1] 6 5 3 4 2 1
[1] 2 2 1 2 3 1
[1] sun   stars cat   dog   bird  1000 
Levels: 1000 bird cat dog stars sun
[1] moon  moon  dog   moon  plane dog  
Levels: dog moon plane


So notice "dog" is 4 in V1, yet its 1 in V2.  Is there a way, either on import, or after, to have factors computed for both columns and assigned
the same internal values?

Brian
#
To clarify on my previous post, here is a representation of what I am trying to accomplish:

I would like every unique value in either column to be assigned a number so like so:

    V1    V2       V3
1   sun  moon    stars
2 stars  moon      sun
3   cat   dog   catdog
4   dog  moon      sun
5  bird plane superman
6  1000   dog     2000

Level			Value
sun			->	0
stars		->	1
cat			->	2
dog			->	3
bird			->	4
1000		->	5
moon		->	6
plane		->	7
catdog		->	8
superman	->	9
2000		->   10
etc
etc

so internally its represented as:

    V1    V2       V3
1   0		6	1
2   1		6	0
3   2		3	8
4   3		6	0
5   4		7	9
6   5		3	10

does this make sense?  I am hoping there is a way to accomplish this.

Brian
On Nov 23, 2012, at 11:42 PM, Brian Feeny <bfeeny at mac.com> wrote:

            
#
Hello,

You can do what you want, but the coding of factors starts at 1 not at 0.


dat <- read.table(text="
V1    V2       V3
1   sun  moon    stars
2 stars  moon      sun
3   cat   dog   catdog
4   dog  moon      sun
5  bird plane superman
6  1000   dog     2000
", header = TRUE)

levs <- unique(unlist(dat))

dat$V1 <- factor(dat$V1, levels = levs)
dat$V2 <- factor(dat$V2, levels = levs)
dat$V3 <- factor(dat$V3, levels = levs)

str(dat)
'data.frame':   6 obs. of  3 variables:
  $ V1: Factor w/ 11 levels "sun","stars",..: 1 2 3 4 5 6
  $ V2: Factor w/ 11 levels "sun","stars",..: 7 7 4 7 8 4
  $ V3: Factor w/ 11 levels "sun","stars",..: 2 1 9 1 10 11


Hope this helps,

Rui Barradas
Em 24-11-2012 07:33, Brian Feeny escreveu:
#
Hello,

If you want the factor sorted, you'll have to do it manually.

levs <- sort(unique(as.character(unlist(dat))))

Rui Barradas
Em 24-11-2012 12:57, Rui Barradas escreveu:
#
On Nov 23, 2012, at 8:42 PM, Brian Feeny wrote:

            
> dat[] <- lapply(dat, function(x) factor(as.character(x),
                                           levels=  
levels(unlist(dat)) ) )
 > dat
      V1    V2       V3
1   sun  moon    stars
2 stars  moon      sun
3   cat   dog   catdog
4   dog  moon      sun
5  bird plane superman
6  1000   dog     2000
 > levels(dat[[1]])
  [1] "1000"     "bird"     "cat"      "dog"      "stars"    "sun"
  [7] "moon"     "plane"    "2000"     "catdog"   "superman"

I see your "clarification". Reordering the representation can be done  
with :

levels(dat) <- <character vector>