any other fast method for median calculation
On Tue, 14 Apr 2009, S Ellison wrote:
Sorting with an appropriate algorithm is nlog(n), so it's very hard to get the 'exact' median any faster.
There actually are linear-time algorithms for the median, but n has to be very large before they are worth using, and by then you have to start considering locality of reference and other issues.
In any case, it looks like you are not constrained by the median algorithm, but by the number of calls. You might do a lot better with apply, though
apply(df,2,median)
On my system 200k columns were processed in negligible time by apply and I'm still waiting for mapply.
I'd also note that this is the sort of problem where the profiler is useful: you can see on a smaller subset whether R is spending most of its time in median() or somewhere else.
I wouldn't be surprised if a while() loop was even faster than apply() in this setting, but probably not enough to care about.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle