Suggestions to speed up median() and has.na()

Bill Dunlap · 2006-04-10T23:42:55Z

On Mon, 10 Apr 2006, Thomas Lumley wrote: > On Mon, 10 Apr 2006, Henrik Bengtsson wrote: > > > Hi, > > > > I've got two suggestions how to speed up median() about 50%. For all > > iterative methods calling median() in the loops this has a major > > impact. The second suggestion will apply to other methods too. > > > Suggestion 2: > > Create a has.na(x) function to replace any(is.na(x)) that returns TRUE > > as soon as a NA value is detected. In the best case it returns after > > the first in

Bill Dunlap

Mon, Apr 10, 2006 4:42 PM

On Mon, 10 Apr 2006, Thomas Lumley wrote:

Splus has such a function, but it is called anyMissing().  In the
interests of interoperability it would be nice if R used that name.
(I did not choose the name, but that is what it is.)

The following experiment using Splus seems to indicate the speedup has
less to do with stopping at the first NA than it does with not
making/filling/copying/whatever the big vector of logicals that is.na
returns.

   > # NA near start of list of 10 million integers
   > { z<-replace(1:1e7,2,NA); unix.time(anyMissing(z)) }
   [1] 0 0 0 0 0
   > { z<-replace(1:1e7,2,NA); unix.time(any(is.na(z)))}
   [1] 0.62 0.13 0.75 0.00 0.00

   > # NA at end of list
   > { z<-replace(1:1e7,1e7,NA); unix.time(anyMissing(z)) }
   [1] 0.07 0.00 0.07 0.00 0.00
   > { z<-replace(1:1e7,1e7,NA); unix.time(any(is.na(z)))}
   [1] 0.64 0.11 0.75 0.00 0.00

The Splus anyMissing is an s3 generic (i.e., it calls UseMethod()).
The Splus is.na is an s4 generic and its default method may invoke
an s3 generic.

----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com
360-428-8146

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."

Suggestions to speed up median() and has.na()

Thread (8 messages)