Dear expeRts,
The question is rather simple: Why does aggregate (or similarly tapply()) not keep the order of the grouping variable(s)?
Here is an example:
x <- data.frame(group = rep(LETTERS[1:2], each=10),
year = rep(rep(2001:2005, each=2), 2),
value = rep(1:10, each=2))
## => sorted according to group, then year
aggregate(value ~ group + year, data=x, FUN=function(z) z[1])
## => sorted according to year, then group
I rather expected this to be the default:
aggregate(value ~ year + group, data=x, FUN=function(z) z[1])[,c(2,1,3)]
## => same order as input (grouping) variables
Same with tapply:
as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=function(z) z[1])))
Cheers,
Marius
aggregate(), tapply(): Why is the order of the grouping variables not kept?
2 messages · Marius Hofert, Peter Ehlers
On 2013-03-11 13:52, Marius Hofert wrote:
Dear expeRts,
The question is rather simple: Why does aggregate (or similarly tapply()) not keep the order of the grouping variable(s)?
Here is an example:
x <- data.frame(group = rep(LETTERS[1:2], each=10),
year = rep(rep(2001:2005, each=2), 2),
value = rep(1:10, each=2))
## => sorted according to group, then year
aggregate(value ~ group + year, data=x, FUN=function(z) z[1])
## => sorted according to year, then group
I rather expected this to be the default:
aggregate(value ~ year + group, data=x, FUN=function(z) z[1])[,c(2,1,3)]
## => same order as input (grouping) variables
Same with tapply:
as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=function(z) z[1])))
Cheers,
Marius
I'm no expeRt, but suppose that we change the setup slightly: xx <- x[sample(nrow(x)), ] Now what would you like aggregate(value ~ group + year, data=xx, FUN=function(z) z[1]) to return? Personally, I prefer to have R return the same thing regardless of how the input dataframe is sorted, i.e. the result should depend only on the formula. You just have to know that the order is to have the first factor vary most rapidly, then the next, etc. I think that's documented somewhere, but I don't know where. Peter Ehlers