How to count rows with a condition
data[ ave(data$ac_name, data$ac_name, length) <= 5, ]
fails for two reasons:
a) you need to label the FUN argument, FUN=length, since there
is a ... in the middle of ave's argument list to catch all the grouping arguments
b) the type of the first argument to needs to be compatible with
the type of the return value of FUN(). If ac_name is a factor
you get NA's and warnings, if it is character the "<5" starts using
character order instead of numerical order, leading to incorrect results
because "11"<"5":
data <- data.frame(ac_name=rep(c("Amos","Boris","Charlotte"),c(3,8,11)), n=101:122, stringsAsFactors=FALSE)
data[ ave(data$ac_name, data$ac_name, FUN=length) <= 5, ]
ac_name n 1 Amos 101 2 Amos 102 3 Amos 103 12 Charlotte 112 13 Charlotte 113 ... [ rows elided ] ... 22 Charlotte 122
data <- data.frame(ac_name=rep(c("Amos","Boris","Charlotte"),c(3,8,11)), n=101:122, stringsAsFactors=TRUE)
data[ ave(data$ac_name, data$ac_name, FUN=length) <= 5, ]
ac_name n NA <NA> NA NA.1 <NA> NA NA.2 <NA> NA ... [rows elided] ... NA.21 <NA> NA Warning messages: 1: In `[<-.factor`(`*tmp*`, i, value = 3L) : invalid factor level, NAs generated 2: In `[<-.factor`(`*tmp*`, i, value = 8L) : invalid factor level, NAs generated 3: In `[<-.factor`(`*tmp*`, i, value = 11L) : invalid factor level, NAs generated 4: In Ops.factor(ave(data$ac_name, data$ac_name, FUN = length), 5) : <= not meaningful for factors That is why I made the first argument integer:
data[ ave(integer(nrow(data)), data$ac_name, FUN=length) <= 5, ]
ac_name n 1 Amos 101 2 Amos 102 3 Amos 103 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius Sent: Wednesday, October 17, 2012 1:25 PM To: fxen3k Cc: r-help at r-project.org Subject: Re: [R] How to count rows with a condition On Oct 17, 2012, at 5:44 AM, fxen3k wrote:
Hi, I have a dataset called "data". There is one row called "ac_name". Some names in this column appear very often, some less. What I want is to filter this dataset with the following condition: Exclude the names, which appear more than five times. (example: House A appears 8 times ==> exclude it; House B appears 5 times ==> include it etc.) In the end, I want to have the old "data" dataset excluding the rows with the above mentioned condition and another list with all the names which have been excluded.
data[ ave(data$ac_name, data$ac_name, length) <= 5, ] # all with 5 or fewer entries -- David Winsemius, MD Alameda, CA, USA
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.