Skip to content
Prev 309853 / 398506 Next

select duplicate identifier with higher mean across sample columns

Hi,
Try this:
mdf[unlist(tapply(rowMeans(mdf[,-1]),mdf$id,FUN=function(x) x%in%max(x))),]
#? id samp1 samp2 samp2a
#2? A?? 120?? 130??? 150
#3? C?? 101?? 131??? 151
#4? D?? 110?? 150??? 130
#5? E?? 132?? 122??? 122
#6? F?? 123?? 143??? 143
A.K.




----- Original Message -----
From: Adrian Johnson <oriolebaltimore at gmail.com>
To: r-help <r-help at r-project.org>
Cc: 
Sent: Sunday, November 4, 2012 2:25 PM
Subject: [R] select duplicate identifier with higher mean across sample columns

Hi Group:
I searched R groups before posting this question. I could not find the
appropriate answer and I do not have clear understanding how to do
this in R.

I have a data frame with duplicated row identifiers but with different
values across columns. I want to select the identifier with higher
inter-quartile range or mean.


id <- c("A", "A", "C", "D", "E", "F")
year <- c(2000, 2001, 2001, 2002, 2003, 2004)
samp1 <- c(100, 120, 101, 110, 132,123)
samp2 <- c(110, 130, 131, 150, 122,143)
mdf <- data.frame(id,samp1,samp2,samp2a)
? id samp1 samp2 samp2a
1? A?  100?  110? ? 110
2? A?  120?  130? ? 150
3? C?  101?  131? ? 151
4? D?  110?  150? ? 130
5? E?  132?  122? ? 122
6? F?  123?  143? ? 143


There are two A ids in this df. I want to select the row with higher mean.

How can I do this.
Thanks
Adrian

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.