Building factors across two columns, is this possible?
To clarify on my previous post, here is a representation of what I am trying to accomplish:
I would like every unique value in either column to be assigned a number so like so:
V1 V2 V3
1 sun moon stars
2 stars moon sun
3 cat dog catdog
4 dog moon sun
5 bird plane superman
6 1000 dog 2000
Level Value
sun -> 0
stars -> 1
cat -> 2
dog -> 3
bird -> 4
1000 -> 5
moon -> 6
plane -> 7
catdog -> 8
superman -> 9
2000 -> 10
etc
etc
so internally its represented as:
V1 V2 V3
1 0 6 1
2 1 6 0
3 2 3 8
4 3 6 0
5 4 7 9
6 5 3 10
does this make sense? I am hoping there is a way to accomplish this.
Brian
On Nov 23, 2012, at 11:42 PM, Brian Feeny <bfeeny at mac.com> wrote:
I am trying to make it so two columns with similar data use the same internal numbers for same factors, here is the example:
read.csv("test.csv",header =FALSE,sep=",")
V1 V2 V3 1 sun moon stars 2 stars moon sun 3 cat dog catdog 4 dog moon sun 5 bird plane superman 6 1000 dog 2000
data <- read.csv("test.csv",header =FALSE,sep=",")
str(data)
'data.frame': 6 obs. of 3 variables: $ V1: Factor w/ 6 levels "1000","bird",..: 6 5 3 4 2 1 $ V2: Factor w/ 3 levels "dog","moon","plane": 2 2 1 2 3 1 $ V3: Factor w/ 5 levels "2000","catdog",..: 3 4 2 4 5 1
as.numeric(data$V1)
[1] 6 5 3 4 2 1
as.numeric(data$V2)
[1] 2 2 1 2 3 1
as.factor(data$V1)
[1] sun stars cat dog bird 1000 Levels: 1000 bird cat dog stars sun
as.factor(data$V2)
[1] moon moon dog moon plane dog Levels: dog moon plane So notice "dog" is 4 in V1, yet its 1 in V2. Is there a way, either on import, or after, to have factors computed for both columns and assigned the same internal values? Brian