Skip to content

Creating a factor from a combination of vectors

3 messages · Yves Brostaux, Eric Lecoutre, Gabor Grothendieck

#
Dear list,

Here's a little problem I already solved with my own coding style, but I 
feel there is a more efficient and cleaner way to write it, but had no 
success finding the "clever" solution.

I want to produce a factor from a subset of the combination of two 
vectors. I have the vectors a et b in a data-frame :

 > df <- expand.grid(a=c(0, 5, 10, 25, 50), b=c(0, 25, 50, 100, 200))
 > fac.df
    a   b
1   0   0
2   5   0
3  10   0
4  25   0
5  50   0
6   0  25
7   5  25
<snip>

and want to create a factor which levels correspond to particular 
combinations of a and b (let's say Low for a=0 & b=0, Medium for a=10 & 
b=50, High for a=50 & b=200, others levels set to NA), reading them from 
a data-frame which describes the desired subset and corresponding levels.

Here's my own solution (inputs are data-frames df and cas, output is the 
sub factor):

 > cas <- as.data.frame(matrix(c(0, 10,50, 0, 50, 200), 3, 
2,dimnames=list(c("Low", "Medium", "High"), c("a", "b"))))
 > cas
        a   b
Low     0   0
Medium 10  50
High   50 200

 > sub <- character(length(df$a))
 > for (i in 1:length(df$a)) {
+   temp <- rownames(cas)[cas$a==df$a[i] & cas$b==df$b[i]]
+   sub[i] <- ifelse(length(temp)>0, temp, NA)
+ }
 > sub <- ordered(sub, levels=c("Low", "Medium", "High"))
 > sub
 [1] Low    <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   
<NA>   <NA>   <NA>   Medium <NA>   <NA>   <NA>   <NA> 
[18] <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   High 
Levels: Low < Medium < High

I was looking for a vectorized solution (apply style) binding 
data-frames df and cas, but didn't succeed avoiding the for loop. Could 
anybody bring me the ligths over the darkness of my ignorance ? Thank 
you very much in advance.
#
Hi Yves,

Using your objects, here is a way:


 > cascombo=do.call("paste",c(cas,sep="."))
 > factor(do.call("paste",c(df,sep=".")),levels=cascombo,labels=rownames(cas))
[1] 
Low    <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA> 
  <NA>   Medium <NA>   <NA>
[16] <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   High
Levels: Low Medium High


It uses:
? paste (sep=.) to create the combinations ie 0.0, 10.50, etc.
? do.call to invoke the paste on the columns of the data.frames
? factor specifying existing levels (only those defined by cas data.frame) 
anbd labels

Eric
At 10:12 30/11/2004, Yves Brostaux wrote:
Eric Lecoutre
UCL /  Institut de Statistique
Voie du Roman Pays, 20
1348 Louvain-la-Neuve
Belgium

tel: (+32)(0)10473050
lecoutre at stat.ucl.ac.be
http://www.stat.ucl.ac.be/ISpersonnel/lecoutre

If the statistics are boring, then you've got the wrong numbers. -Edward 
Tufte
#
Yves Brostaux <brostaux.y <at> fsagx.ac.be> writes:

: 
: Dear list,
: 
: Here's a little problem I already solved with my own coding style, but I 
: feel there is a more efficient and cleaner way to write it, but had no 
: success finding the "clever" solution.
: 
: I want to produce a factor from a subset of the combination of two 
: vectors. I have the vectors a et b in a data-frame :
: 
:  > df <- expand.grid(a=c(0, 5, 10, 25, 50), b=c(0, 25, 50, 100, 200))
:  > fac.df
:     a   b
: 1   0   0
: 2   5   0
: 3  10   0
: 4  25   0
: 5  50   0
: 6   0  25
: 7   5  25
: <snip>
: 
: and want to create a factor which levels correspond to particular 
: combinations of a and b (let's say Low for a=0 & b=0, Medium for a=10 & 
: b=50, High for a=50 & b=200, others levels set to NA), reading them from 
: a data-frame which describes the desired subset and corresponding levels.
: 
: Here's my own solution (inputs are data-frames df and cas, output is the 
: sub factor):
: 
:  > cas <- as.data.frame(matrix(c(0, 10,50, 0, 50, 200), 3, 
: 2,dimnames=list(c("Low", "Medium", "High"), c("a", "b"))))
:  > cas
:         a   b
: Low     0   0
: Medium 10  50
: High   50 200
: 
:  > sub <- character(length(df$a))
:  > for (i in 1:length(df$a)) {
: +   temp <- rownames(cas)[cas$a==df$a[i] & cas$b==df$b[i]]
: +   sub[i] <- ifelse(length(temp)>0, temp, NA)
: + }
:  > sub <- ordered(sub, levels=c("Low", "Medium", "High"))
:  > sub
:  [1] Low    <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   
: <NA>   <NA>   <NA>   Medium <NA>   <NA>   <NA>   <NA> 
: [18] <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   High 
: Levels: Low < Medium < High
: 
: I was looking for a vectorized solution (apply style) binding 
: data-frames df and cas, but didn't succeed avoiding the for loop. Could 
: anybody bring me the ligths over the darkness of my ignorance ? Thank 
: you very much in advance.
: 


Use interaction() and factor() like this:

factor( interaction(df), lev = c("0.0", "10.50", "50.200"),
  lab = c("Low", "Medium", "High"), ordered = TRUE)