select duplicate identifier with higher mean across sample columns

Is this what you want:
mdf <- read.table(text = "  id samp1 samp2 samp2a
+ 1  A   100   110    110
+ 2  A   120   130    150
+ 3  C   101   131    151
+ 4  D   110   150    130
+ 5  E   132   122    122
+ 6  F   123   143    143", header = TRUE)
result <- do.call(rbind, lapply(split(mdf, mdf$id), function(.id){
+     maxIndx <- which.max(rowMeans(.id[, -1L]))
+     .id[maxIndx, ]
+ }))
result
id samp1 samp2 samp2a
A  A   120   130    150
C  C   101   131    151
D  D   110   150    130
E  E   132   122    122
F  F   123   143    143

On Sun, Nov 4, 2012 at 2:25 PM, Adrian Johnson
Hi Group:
I searched R groups before posting this question. I could not find the
appropriate answer and I do not have clear understanding how to do
this in R.

I have a data frame with duplicated row identifiers but with different
values across columns. I want to select the identifier with higher
inter-quartile range or mean.

 id <- c("A", "A", "C", "D", "E", "F")
 year <- c(2000, 2001, 2001, 2002, 2003, 2004)
 samp1 <- c(100, 120, 101, 110, 132,123)
 samp2 <- c(110, 130, 131, 150, 122,143)
 mdf <- data.frame(id,samp1,samp2,samp2a)

mdf
  id samp1 samp2 samp2a
1  A   100   110    110
2  A   120   130    150
3  C   101   131    151
4  D   110   150    130
5  E   132   122    122
6  F   123   143    143

There are two A ids in this df. I want to select the row with higher mean.

How can I do this.
Thanks
Adrian

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

select duplicate identifier with higher mean across sample columns

Thread (5 messages)