Skip to content

Subseting by more than one factor...

3 messages · Fernando Henrique Ferraz P. da Rosa, Sundar Dorai-Raj, Douglas Bates

#
Is it possible in R to subset a dataframe by more than one factor, all at
once?
     For instance, I have the dataframe: 
 >data 
   p1 p2 p3 p4 p5 p6 p7 p8 p9 p10      pred
  1    0  1  0  0  0  0  0  0  0   0 0.5862069
  4    0  0  0  0  0  0  0  0  0   1 0.5862069
  5    0  0  0  0  0  0  1  0  0   0 0.5862069
  6    0  0  0  0  0  0  0  1  0   0 0.5862069
  7    0  0  1  0  0  0  0  0  0   0 0.5862069
  9    0  0  0  0  1  0  0  0  0   0 0.5862069
  20   0  1  1  0  0  0  0  0  0   0 0.5862069
  22   0  1  0  0  1  0  0  0  0   0 0.5862069
  24   0  1  0  0  0  0  1  0  0   0 0.5862069
  25   0  1  0  0  0  0  0  1  0   0 0.5862069
  27   0  1  0  0  0  0  0  0  0   1 0.5862069

  If I want to subset only those points that have p4 = 1, I do:
   > subset(data,p4 == 1)
  And that's fine. Now suppose I want to subset those that not only have p4
= 1, but also p6 = 1.
   I tried subset(data,p4 == 1 && p6 == 1) or subset(data,p4==1 & p6==1).
But it didn't work.
   Then I found a clumsy way to do it :
    subset(subset(data,p4==1),p6==1)
    Which works. But it soon gets very clumsy as the number of conditions
increase (I end up with a really large number of nested subsets). Is there a
simpler way to do that?


--
#
Fernando Henrique Ferraz Pereira da Rosa wrote:
It didn't? It does for me:

R> subset(z, p4 == 1 & p6 == 1)
  [1] p1   p2   p3   p4   p5   p6   p7   p8   p9   p10  pred
<0 rows> (or 0-length row.names)
R> subset(z, p2 == 1 & p8 == 1)
    p1 p2 p3 p4 p5 p6 p7 p8 p9 p10      pred
10  0  1  0  0  0  0  0  1  0   0 0.5862069
R> subset(z, (p2 == 1 & p3 == 0) | p5 == 1)
    p1 p2 p3 p4 p5 p6 p7 p8 p9 p10      pred
1   0  1  0  0  0  0  0  0  0   0 0.5862069
6   0  0  0  0  1  0  0  0  0   0 0.5862069
8   0  1  0  0  1  0  0  0  0   0 0.5862069
9   0  1  0  0  0  0  1  0  0   0 0.5862069
10  0  1  0  0  0  0  0  1  0   0 0.5862069
11  0  1  0  0  0  0  0  0  0   1 0.5862069
R> version
          _
platform i386-pc-mingw32
arch     i386
os       mingw32
system   i386, mingw32
status
major    1
minor    7.0
year     2003
month    04
day      16
language R
R>


[snip]

Regards,
Sundar
#
Fernando Henrique Ferraz Pereira da Rosa <mentus at gmx.de> writes:
As Sundar pointed out it is the second form that you want.  When
intersecting conditions in subset() use &, not &&.

The way that you pasted the output in your message the column names
did not align with the columns.  I changed this in the part that I
quoted above.  This shows that you chose the wrong example, I think,
because that intersection is empty.  Try 

 subset(data, p2 == 1 & p3 == 1)

instead.