I'm no expeRt, but suppose that we change the setup slightly: xx <- x[sample(nrow(x)), ] Now what would you like aggregate(value ~ group + year, data=xx, FUN=function(z) z[1]) to return? Personally, I prefer to have R return the same thing regardless of how the input dataframe is sorted,
Personally, I prefer to have R not to change my input as much as possible... but I totally agree with you that there are other instances where it's preferable that the output does not depend on the input.
i.e. the result should depend only on the formula. You just have to know that the order is to have the first factor vary most rapidly,
... which I still find very confusing/unnatural, but okay.
then the next, etc. I think that's documented somewhere, but I don't know where.
it's also the default behavior of expand.grid() for example. Cheers, Marius
Peter Ehlers