Another application of that technique can be used to quickly compute
medians by groups:
gm <- function(x, group){ # medians by group:
sapply(split(x,group),median)
o<-order(group, x)
group <- group[o]
x <- x[o]
changes <- group[-1] != group[-length(group)]
first <- which(c(TRUE, changes))
last <- which(c(changes, TRUE))
lowerMedian <- x[floor((first+last)/2)]
upperMedian <- x[ceiling((first+last)/2)]
median <- (lowerMedian+upperMedian)/2
names(median) <- group[first]
median
}
For a 10^5 long x and a somewhat fewer than 3*10^4 distinct groups
(in random order) the times are:
group<-sample(1:30000, size=100000, replace=TRUE)
x<-rnorm(length(group))*10 + group
unix.time(z0<-sapply(split(x,group), median))
user system elapsed
2.72 0.00 3.20
unix.time(z1<-gm(x,group))
user system elapsed
0.12 0.00 0.16