Skip to content

cache most-recent dispatch

4 messages · Hervé Pagès, John Chambers, Valerie Obenchain

#
Hi,

S4 method dispatch can be very slow. Would it be reasonable to cache the 
most
recent dispatch, anticipating the next invocation will be on the same 
type? This
would be very helpful in loops.

   fun0 <- function(x)
       sapply(x, paste, collapse="+")
   fun1 <- function(x) {
       paste <- selectMethod(paste, class(x[[1]]))
       sapply(x, paste, collapse="+")
   }
   lst <- split(rep(LETTERS, 100), rep(1:1300, 2))

   library(microbenchmark)
   microbenchmark(fun0(lst), times=10)
   ## Unit: milliseconds
   ##       expr      min       lq   median      uq      max neval
   ##  fun0(lst) 4.153287 4.180659 4.513539 5.19261 5.280481    10

   setGeneric("paste")
   microbenchmark(fun0(lst), fun1(lst), times=10)
   ## >     microbenchmark(fun0(lst), fun1(lst), times=10)
   ## Unit: milliseconds
   ##       expr       min       lq    median        uq       max neval
   ##  fun0(lst) 21.093180 21.27616 21.453174 21.833686 24.758791    10
   ##  fun1(lst)  4.517808  4.53067  4.582641  4.682235  5.121856    10

Dispatch seems to be especially slow when packages are involved, e.g.,
with the Bioconductor IRanges package
(http://bioconductor.org/packages/release/bioc/html/IRanges.html)

   removeGeneric("paste")
   library(IRanges)
   showMethods(paste)
   ## Function: paste (package BiocGenerics)
   ## ...="ANY"
   ## ...="Rle"
   selectMethod(paste, "ANY")
   ## Method Definition (Class "derivedDefaultMethod"):
   ##
   ## function (..., sep = " ", collapse = NULL)
   ## .Internal(paste(list(...), sep, collapse))
   ## <environment: namespace:base>
   ##
   ## Signatures:
   ##         ...
   ## target  "ANY"
   ## defined "ANY"

   microbenchmark(fun0(lst), fun1(lst), times=10)
   ## Unit: milliseconds
   ##       expr        min         lq     median         uq        max 
neval
   ##  fun0(lst) 233.539585 234.592491 236.311209 237.268506 243.181123 
    10
   ##  fun1(lst)   4.564914   4.592996   4.642898   4.729009   5.492706 
    10

   sessionInfo()
   ## R version 3.0.0 Patched (2013-04-04 r62492)
   ## Platform: x86_64-unknown-linux-gnu (64-bit)
   ##
   ## locale:
   ##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
   ##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
   ##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
   ##  [7] LC_PAPER=C                 LC_NAME=C
   ##  [9] LC_ADDRESS=C               LC_TELEPHONE=C
   ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
   ##
   ## attached base packages:
   ## [1] parallel  stats     graphics  grDevices utils     datasets 
methods
   ## [8] base
   ##
   ## other attached packages:
   ## [1] IRanges_1.19.15      BiocGenerics_0.7.2   microbenchmark_1.3-0
   ##
   ## loaded via a namespace (and not attached):
   ## [1] stats4_3.0.0


Thanks,
Valerie
#
Hi Val,

[off list... I don't want to compromise your chances to start a
constructive discussion ;-)]

Thanks for reporting this. Just wanted to mention that the reason I
think the situation is worst when you use the paste() generic defined
in BiocGenerics than when you make paste() a generic with
setGeneric("paste") is because of the signature of the generic.
With the latter dispatch is on the 'sep' and 'collapse' args only
(which is surprising but that's another story), while
with the former it's on ...:

   > setGeneric("paste")
   [1] "paste"

   > paste
   standardGeneric for "paste" defined from package "base"

   function (..., sep = " ", collapse = NULL)
   standardGeneric("paste")
   <environment: 0x157a028>
   Methods may be defined for arguments: sep, collapse
   Use  showMethods("paste")  for currently available ones.

   ## Note that showMethods() is broken (it contradicts the above
   ## that indicates dispatch is on 'sep' and 'collapse').
   > showMethods("paste")
   Function: paste (package base)
   ...="ANY"

   > microbenchmark(fun0(lst), fun1(lst), times=10)
   Unit: milliseconds
         expr       min        lq    median        uq       max neval
    fun0(lst) 27.374228 27.508580 28.144858 28.895889 33.528221    10
    fun1(lst)  5.474173  5.739289  5.803471  6.050482  6.825982    10

   > removeGeneric("paste")
   [1] TRUE

   > setGeneric("paste", signature="...")  # this how it's defined in 
BiocGenerics
   Creating a new generic function for ?paste? in the global environment
   [1] "paste"

   > microbenchmark(fun0(lst), fun1(lst), times=10)
   Unit: milliseconds
         expr        min         lq     median         uq        max neval
    fun0(lst) 149.828201 153.192866 155.845508 157.916067 176.313906    10
    fun1(lst)   4.924387   5.088094   5.114532   5.200432   5.332386    10

Dispatch on ... seems to have a ridiculously high cost!

H.
On 07/01/2013 10:04 PM, Valerie Obenchain wrote:

  
    
#
It's hard to see how repeated dispatch on the same classes can be that 
slow, _if_ the function being called each time is itself doing some 
substantial work.

The first call (in a session) with a particular signature searches for 
inherited methods and stores the method found in a table.  The following 
calls with that signature should do a single lookup in a hash table. 
Caching the last signature is unlikely to be dramatically faster, but we 
can experiment and see.

What is substantially different is calling a generic function vs calling 
a primitive or internal.  If the local paste you constructed is the 
default, base::paste, that is a .Internal.

Not going through the R generic function several thousand times would 
make a difference.

It's a fundamental point about R that function calls do enough work that 
they add significant time to a "trivial" computation, such as a 
primitive call.  There are various efforts going on these days to 
provide more efficient alternatives.  They're all helpful; my personal 
favorite when the game is worth it is to consider doing key computations 
in a seriously faster language, like C++ via Rcpp.

John
On 7/1/13 10:04 PM, Valerie Obenchain wrote:
#
Thanks for the background and suggestions.

Valerie
On 07/02/2013 08:41 AM, John Chambers wrote: