Back to formatted view
Raw Message

Message-ID: <Pine.LNX.4.43.0904140818130.32417@hymn13.u.washington.edu>
Date: 2009-04-14T15:18:13Z
From: Thomas Lumley
Subject: any other fast method for median calculation
In-Reply-To: <s9e462e1.025@tedmail.lgc.co.uk>

On Tue, 14 Apr 2009, S Ellison wrote:

> Sorting with an appropriate algorithm is nlog(n), so it's very hard to
> get the 'exact' median any faster.

There actually are linear-time algorithms for the median, but n has to be very large before they are worth using, and by then you have to start considering locality of reference and other issues.

> In any case, it looks like you are not constrained by the median
> algorithm, but by the number of calls. You might do a lot better with
> apply, though
>> apply(df,2,median)
>
> On my system 200k columns were processed in negligible time by apply
> and I'm still waiting for mapply.

I'd also note that this is the sort of problem where the profiler is useful: you can see on a smaller subset whether R is spending most of its time in median() or somewhere else.

I wouldn't be surprised if a while() loop was even faster than apply() in this setting, but probably not enough to care about.

       -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle