Skip to content

Assigning cases to groupings based on the values of several variables

5 messages · Duncan Murdoch, arun, Dimitri Liakhovitski

#
On 12-12-07 7:27 AM, Dimitri Liakhovitski wrote:
Since your groups are so regular, you can compute the groups directly. 
Convert each column to a factor (this might have happened automatically, 
depending on your data and options), then use as.integer to convert to a 
numeric value.

So a simple solution would be

mydata$mygroup.m4 <- with(mydata,
                              4*(2-as.integer(factor(sex)))
                              + as.integer(factor(age)))

It would be a little simpler if you wanted the sex factor in alphbetical 
order; then you wouldn't need to subtract from 2.

If your real data wasn't so regular, another approach would be to set up 
a matrix, indexed by sex and age, that gives the desired group number. 
That is somewhat like your "groupings" solution; I'm not sure it would 
be preferable to what you did.

Duncan Murdoch
#
HI,

In your method2 and method3, you are using the groupings data.? If that is the case, is it possible for you to use ?merge() or ?join() from library(plyr)
?join(mydata,groupings,by=c("sex","age"),type="inner")
?#? sex age mygroup
#1??? m?? 1?????? 1
#2??? m?? 2?????? 2
#3??? m?? 3?????? 3
#4??? m?? 4?????? 4
#5??? f?? 1?????? 5
#6??? f?? 2?????? 6
#7??? f?? 3?????? 7
#8??? f?? 4?????? 8
#9??? m?? 1?????? 1
#10?? m?? 2?????? 2
#11?? m?? 3?????? 3
#12?? m?? 4?????? 4
#13?? f?? 1?????? 5
#14?? f?? 2?????? 6
#15?? f?? 3?????? 7
#16?? f?? 4?????? 8
A.K.



----- Original Message -----
From: Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com>
To: r-help <r-help at r-project.org>
Cc: 
Sent: Friday, December 7, 2012 7:27 AM
Subject: [R] Assigning cases to groupings based on the values of several variables

Dear R-ers,

my task is to simple: to assign cases to desired groupings based on the
combined values on 2 variables. I can think of 3 methods of doing it.
Method 1 seems to me pretty r-like, but it requires a lot of lines of code
- onerous.
Method 2 is a loop, so not very good - as it loops through all rows of
mydata.
Method 3 is a loop but loops through fewer lines, so it seems to me more
efficient.
Can you please tell me:
1. Which of my methods is more efficient?
2. Is there maybe an even more efficient r-like way of doing it?
Imagine - "mydata" is actually a very tall data frame.
Thanks a lot!
Dimitri

### My Data:
mydata<-data.frame(sex=rep(c(rep("m",4),rep("f",4)),2),age=rep(c(1:4,1:4),2))
(mydata)

### My desired assignments (in column "mygroup")
groupings<-data.frame(sex=c(rep("m",4),rep("f",4)),age=c(1:4,1:4),mygroup=1:8)
(groupings)

# No, I don't need a solution where the last column of "groupings" is
stacked twice and bound to "mydata"

# Method 1 of assigning to groups - requires a lot of lines of code:
mydata$mygroup.m1<-NA
mydata[(mydata$sex %in% "m")&(mydata$age %in% 1),"mygroup.m1"]<-1
mydata[(mydata$sex %in% "m")&(mydata$age %in% 2),"mygroup.m1"]<-2
mydata[(mydata$sex %in% "m")&(mydata$age %in% 3),"mygroup.m1"]<-3
mydata[(mydata$sex %in% "m")&(mydata$age %in% 4),"mygroup.m1"]<-4
mydata[(mydata$sex %in% "f")&(mydata$age %in% 1),"mygroup.m1"]<-5
mydata[(mydata$sex %in% "f")&(mydata$age %in% 2),"mygroup.m1"]<-6
mydata[(mydata$sex %in% "f")&(mydata$age %in% 3),"mygroup.m1"]<-7
mydata[(mydata$sex %in% "f")&(mydata$age %in% 4),"mygroup.m1"]<-8
(mydata)

# Method 2 of assigning to groups - very "loopy":
mydata$mygroup.m2<-NA
for(i in 1:nrow(mydata)){? # i<-1
? mysex<-mydata[i,"sex"]
? myage<-mydata[i,"age"]
? mydata[i,"mygroup.m2"]<-groupings[(groupings$sex %in%
mysex)&(groupings$age %in% myage),"mygroup"]
}
(mydata)

# Method 3 of assigning to groups - also "loopy", but less than Method 2:
mydata$mygroup.m3<-NA
for(i in 1:nrow(groupings)){? # i<-1
? mysex<-groupings[i,"sex"]
? myage<-groupings[i,"age"]
? mydata[(mydata$sex %in% mysex)&(mydata$age %in%
myage),"mygroup.m3"]<-groupings[i,"mygroup"]
}
(mydata)