Skip to content

Simple performance enhancement for ave

1 message · Hadley Wickham

#
n<-100000
grp1<-sample(1:750, n, replace=T)
grp2<-sample(1:750, n, replace=T)
d<-data.frame(x=rnorm(n), y=rnorm(n), grp1=grp1, grp2=grp2)

system.time(ave(d$x, d$grp1, d$grp2, FUN = mean))
#   user  system elapsed
# 19.840   0.125  19.967
system.time(ave(d$x, d$grp1, d$grp2, drop = TRUE, FUN = mean))
#  user  system elapsed
# 2.898   0.058   2.956

This is a pathological example (100,000 observations with around
90,000 groups out of ~500,000 possible), but I don't see any reason
why drop = TRUE shouldn't be the default inside ave.

Hadley