Suggestions to speed up median() and has.na()

Duncan Murdoch · 2006-04-10T23:48:12Z

On 4/10/2006 7:22 PM, Thomas Lumley wrote: > On Mon, 10 Apr 2006, Henrik Bengtsson wrote: > >> Hi, >> >> I've got two suggestions how to speed up median() about 50%. For all >> iterative methods calling median() in the loops this has a major >> impact. The second suggestion will apply to other methods too. > > I'm surprised this has a major impact -- in your example it takes much > longer to generate the ten million numbers than to find the median. > >> Suggestion 1: >> Replace the sort()

Duncan Murdoch

Mon, Apr 10, 2006 4:48 PM

On 4/10/2006 7:22 PM, Thomas Lumley wrote:

I think it would help even in that case if the vector is large, because 
it avoids allocating and disposing of the logical vector of the same 
length as x.

If it's necessary to make it not generic to achieve the speedup, I don't 
think it's worth doing.  If anyNA is written not to be generic I'd guess 
a very common error will be to apply it to a dataframe and get a 
misleading "FALSE" answer.  If we do that, I predict that the total 
amount of r-help time wasted on it will exceed the CPU time saved by 
orders of magnitude.

Duncan Murdoch

Suggestions to speed up median() and has.na()

Thread (8 messages)