Skip to content

subsetting with condition

5 messages · kristina p, David Winsemius, Henrique Dallazuanna +1 more

#
Dear R Team, 

I am a new R user and I am currently trying to subset my data under a
special condition. I have went through several pages of the subsetting
section here on the forum, but I was not able to find an answer.

My data is as follows:

 ID                      NAME       MS     Pol. Party 
1                           John       x       F 
2                           Mary       s       S
3                           Katie      x       O
4                           Sarah      p       L
5                           Martin      x      O
6                           Angelika   x      F
7                            Smith      x      O
....

I am intested in only those observations, where there are at least three
members of 1 political party. That is, I need to throw out all cases in the
example above, except for members of party "O". 

Would really appreciate your help.
K

--
View this message in context: http://r.789695.n4.nabble.com/subsetting-with-condition-tp3567193p3567193.html
Sent from the R help mailing list archive at Nabble.com.
#
On Jun 1, 2011, at 7:00 PM, kristina p wrote:

            
Assume this is in a dataframe, 'pol', and that you have corrected the  
error in colnames, so that it is Pol_Party. the ave function is  
particularly useful when you need to have a vector that "lines up  
along side" the other columns

pol[ave(seq_along(pol$ID), pol$Pol_Party, FUN=length) >= 3 , ]
   ID   NAME MS Pol_Party
3  3  Katie  x         O
5  5 Martin  x         O
7  7  Smith  x         O

(The use of seq_along ensures you will get duplicates of ID that are  
in any qualifying Parties.

Another way to generate the values would be to table()-ulate and pick  
out the names of qualifying Parties:

 > pol[ pol$Pol_Party %in% names(tabl.party)[tabl.party >= 3], ]
   ID   NAME MS Pol_Party
3  3  Katie  x         O
5  5 Martin  x         O
7  7  Smith  x         O
Both methods use logical indexing with the "[.data.frame" function,

  
    
#
Try this:

subset(x, ave(x$ID, x$Pol., FUN = length) >= 3)
On Wed, Jun 1, 2011 at 8:00 PM, kristina p <puzarina.k at gmail.com> wrote:

  
    
#
Kristina:

You posed your question nicely, but it would help R HelperRs if you
used dput() to post your data for us to more easily copy and paste
into R in future.

Anyway, there are probably about a million ways to do this (see
especially the ddply package for organizing data), but one basic
approach is to use table() to count Pol. parties (a bad name for a
variable, btw, as the space requires backtick quoting) and then use
the names attribute of the result to identify the parties you want.
i.e.

tbl <- table(polParties)
names(tbl[tbl>3])  ## gives the names of polParties with > 3 entries.

Then use subset (or indexing) on these with %in%

etc.

-- Bert
On Wed, Jun 1, 2011 at 4:00 PM, kristina p <puzarina.k at gmail.com> wrote: