Skip to content
Prev 20171 / 63424 Next

Suggestions to speed up median() and has.na()

On Mon, 10 Apr 2006, Thomas Lumley wrote:

            
Splus has such a function, but it is called anyMissing().  In the
interests of interoperability it would be nice if R used that name.
(I did not choose the name, but that is what it is.)

The following experiment using Splus seems to indicate the speedup has
less to do with stopping at the first NA than it does with not
making/filling/copying/whatever the big vector of logicals that is.na
returns.

   > # NA near start of list of 10 million integers
   > { z<-replace(1:1e7,2,NA); unix.time(anyMissing(z)) }
   [1] 0 0 0 0 0
   > { z<-replace(1:1e7,2,NA); unix.time(any(is.na(z)))}
   [1] 0.62 0.13 0.75 0.00 0.00

   > # NA at end of list
   > { z<-replace(1:1e7,1e7,NA); unix.time(anyMissing(z)) }
   [1] 0.07 0.00 0.07 0.00 0.00
   > { z<-replace(1:1e7,1e7,NA); unix.time(any(is.na(z)))}
   [1] 0.64 0.11 0.75 0.00 0.00

The Splus anyMissing is an s3 generic (i.e., it calls UseMethod()).
The Splus is.na is an s4 generic and its default method may invoke
an s3 generic.
----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com
360-428-8146

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."