I have a large data-frame with measurements such as:
id i v1 v2 v3
1 1 1.1 1.2 1.3
1 2 1.4 1.5 1.6
1 3 1.5 1.7 1.8
2 1 2.1 2.2 2.3
2 2 2.7 2.5 2.6
2 3 2.4 2.8 2.9
For each unique value of 'id' (which in the real data-set is a combination of
three variables) I want to compute the median of v1 within each group ('i'
distinguishes measurements within a group), and copy the value of the remaining
columns (v2 and v3). Thus, the desired result for this small example is
id i v1 v2 v3
1 2 1.4 1.5 1.6
2 3 2.4 2.8 2.9
I have written a (rather clumsy, in my opinion) function to perform this task
(see below). Is there a more "standard" way of achieving this?
The function is:
agg.column <- function(df, key, groups, FUN)
{
for(i in 1:length(groups))
groups[[i]] <- as.factor(groups[[i]])
groups <- split(df, interaction(groups, lex.order=TRUE))
ret <- data.frame()
for(g in groups) {
key.fun <- FUN(g[[key]])
row.idx <- match(key.fun, g[[key]])
ret <- rbind(ret, g[row.idx,])
}
ret
}
Selection and aggregation in one operation?
2 messages · Zeljko Vrba, Gabor Grothendieck
If, as in this example, i is always 1, 2, ... and has an odd length in each group then: do.call(rbind, by(DF, DF$id, function(x) x[median(x$i), ]))
On Tue, May 26, 2009 at 8:13 AM, Zeljko Vrba <zvrba at ifi.uio.no> wrote:
I have a large data-frame with measurements such as:
id i v1 ?v2 ?v3
1 ?1 1.1 1.2 1.3
1 ?2 1.4 1.5 1.6
1 ?3 1.5 1.7 1.8
2 ?1 2.1 2.2 2.3
2 ?2 2.7 2.5 2.6
2 ?3 2.4 2.8 2.9
For each unique value of 'id' (which in the real data-set is a combination of
three variables) I want to compute the median of v1 within each group ('i'
distinguishes measurements within a group), and copy the value of the remaining
columns (v2 and v3). ?Thus, the desired result for this small example is
id i v1 ?v2 ?v3
1 ?2 1.4 1.5 1.6
2 ?3 2.4 2.8 2.9
I have written a (rather clumsy, in my opinion) function to perform this task
(see below). ?Is there a more "standard" way of achieving this?
The function is:
agg.column <- function(df, key, groups, FUN)
{
?for(i in 1:length(groups))
? ?groups[[i]] <- as.factor(groups[[i]])
?groups <- split(df, interaction(groups, lex.order=TRUE))
?ret <- data.frame()
?for(g in groups) {
? ?key.fun <- FUN(g[[key]])
? ?row.idx <- match(key.fun, g[[key]])
? ?ret <- rbind(ret, g[row.idx,])
?}
?ret
}
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.