select duplicate identifier with higher mean across sample columns
Is this what you want:
mdf <- read.table(text = " id samp1 samp2 samp2a
+ 1 A 100 110 110 + 2 A 120 130 150 + 3 C 101 131 151 + 4 D 110 150 130 + 5 E 132 122 122 + 6 F 123 143 143", header = TRUE)
result <- do.call(rbind, lapply(split(mdf, mdf$id), function(.id){
+ maxIndx <- which.max(rowMeans(.id[, -1L])) + .id[maxIndx, ] + }))
result
id samp1 samp2 samp2a A A 120 130 150 C C 101 131 151 D D 110 150 130 E E 132 122 122 F F 123 143 143 On Sun, Nov 4, 2012 at 2:25 PM, Adrian Johnson
<oriolebaltimore at gmail.com> wrote:
Hi Group:
I searched R groups before posting this question. I could not find the
appropriate answer and I do not have clear understanding how to do
this in R.
I have a data frame with duplicated row identifiers but with different
values across columns. I want to select the identifier with higher
inter-quartile range or mean.
id <- c("A", "A", "C", "D", "E", "F")
year <- c(2000, 2001, 2001, 2002, 2003, 2004)
samp1 <- c(100, 120, 101, 110, 132,123)
samp2 <- c(110, 130, 131, 150, 122,143)
mdf <- data.frame(id,samp1,samp2,samp2a)
mdf
id samp1 samp2 samp2a 1 A 100 110 110 2 A 120 130 150 3 C 101 131 151 4 D 110 150 130 5 E 132 122 122 6 F 123 143 143 There are two A ids in this df. I want to select the row with higher mean. How can I do this. Thanks Adrian
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.