-----Original Message-----
From: hadley wickham [mailto:h.wickham at gmail.com]
Sent: Monday, January 05, 2009 9:10 AM
To: William Dunlap
Cc: gallon.li at gmail.com; R help
Subject: Re: [R] the first and last observation for each subject
Another application of that technique can be used to quickly compute
medians by groups:
gm <- function(x, group){ # medians by group:
sapply(split(x,group),median)
o<-order(group, x)
group <- group[o]
x <- x[o]
changes <- group[-1] != group[-length(group)]
first <- which(c(TRUE, changes))
last <- which(c(changes, TRUE))
lowerMedian <- x[floor((first+last)/2)]
upperMedian <- x[ceiling((first+last)/2)]
median <- (lowerMedian+upperMedian)/2
names(median) <- group[first]
median
}
For a 10^5 long x and a somewhat fewer than 3*10^4 distinct groups
(in random order) the times are:
group<-sample(1:30000, size=100000, replace=TRUE)
x<-rnorm(length(group))*10 + group
unix.time(z0<-sapply(split(x,group), median))
user system elapsed
2.72 0.00 3.20
unix.time(z1<-gm(x,group))
user system elapsed
0.12 0.00 0.16
unix.time(z0<-sapply(split(x,group), median))
user system elapsed
2.733 0.017 2.766
unix.time(z1<-gm(x,group))