Skip to content

How to count rows with a condition

5 messages · fxen3k, David Winsemius, arun +1 more

#
Hi,

I have a dataset called "data". There is one row called "ac_name". Some
names in this column appear very often, some less. 
What I want is to filter this dataset with the following condition:

Exclude the names, which appear more than five times. (example: House A
appears 8 times ==> exclude it; House B appears 5 times ==> include it etc.)

In the end, I want to have the old "data" dataset excluding the rows with
the above mentioned condition and another list with all the names which have
been excluded.


I think for one of the professionals amongst you this is pretty easy to
solve. ;-)

Thanks dudes!

Cheerio,
Felix



--
View this message in context: http://r.789695.n4.nabble.com/How-to-count-rows-with-a-condition-tp4646454.html
Sent from the R help mailing list archive at Nabble.com.
#
One way is:
  ac_name_count <- ave(integer(nrow(data)), data[["ac_name"]], FUN=length)
  data[ac_name_count <= 5, ,drop=FALSE] # rows whose ac_name entry is rare
  data[ac_name_count > 5, ,drop=FALSE]  # rows whose ac_name entry is common
Use
  ac_name_seqno <- ave(integer(nrow(data)), data[["ac_name"]], FUN=seq_along)
to assign a within-group sequence number so you can pick out the first or last
n items in a group for the big groups.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
On Oct 17, 2012, at 5:44 AM, fxen3k wrote:

            
data[ ave(data$ac_name, data$ac_name, length) <= 5, ]  # all with 5 or  
fewer entries
#
HI David,

I tried ur function:
set.seed(1)
dat1<-data.frame(ac_name=rep(c("HouseA","HouseB","HouseC","HouseD","HouseE"),times=c(8,5,4,6,3)),val=rnorm(26,15))
dat2<-within(dat1,{ac_name<-as.character(ac_name)})
dat2<-dat2[order(dat2[,1]),]

?dat2[ave(dat2$ac_name,dat2$ac_name,length)<=5,]
#Error in unique.default(x) : unique() applies only to vectors
#With "FUN" added
head(dat2[ave(dat2$ac_name,dat2$ac_name,FUN=length)<=5,])
#?? ac_name????? val
#9?? HouseB 15.57578
#10? HouseB 14.69461
#11? HouseB 16.51178
#12? HouseB 15.38984
#13? HouseB 14.37876
#14? HouseC 12.78530
A.K.






----- Original Message -----
From: David Winsemius <dwinsemius at comcast.net>
To: fxen3k <f.sehardt at gmail.com>
Cc: r-help at r-project.org
Sent: Wednesday, October 17, 2012 4:25 PM
Subject: Re: [R] How to count rows with a condition
On Oct 17, 2012, at 5:44 AM, fxen3k wrote:

            
data[ ave(data$ac_name, data$ac_name, length) <= 5, ]? # all with 5 or fewer entries

--
David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
data[ ave(data$ac_name, data$ac_name, length) <= 5, ]
fails for two reasons:
  a) you need to label the FUN argument, FUN=length, since there
      is a ... in the middle of ave's argument list to catch all the grouping arguments
  b) the type of the first argument to needs to be compatible with
      the type of the return value of FUN().  If ac_name is a factor
      you get NA's and warnings, if it is character  the "<5" starts using
      character order instead of numerical order, leading to incorrect results
      because "11"<"5":
ac_name   n
1       Amos 101
2       Amos 102
3       Amos 103
12 Charlotte 112
13 Charlotte 113
... [ rows elided ] ...
22 Charlotte 122
ac_name  n
NA       <NA> NA
NA.1     <NA> NA
NA.2     <NA> NA
... [rows elided] ...
NA.21    <NA> NA
Warning messages:
1: In `[<-.factor`(`*tmp*`, i, value = 3L) :
  invalid factor level, NAs generated
2: In `[<-.factor`(`*tmp*`, i, value = 8L) :
  invalid factor level, NAs generated
3: In `[<-.factor`(`*tmp*`, i, value = 11L) :
  invalid factor level, NAs generated
4: In Ops.factor(ave(data$ac_name, data$ac_name, FUN = length), 5) :
  <= not meaningful for factors

That is why I made the first argument integer:
ac_name   n
1    Amos 101
2    Amos 102
3    Amos 103
  

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com