Skip to content
Prev 3789 / 21312 Next

[Bioc-devel] parallel package generics

Hi Malcolm,
On 10/25/2012 11:53 AM, Cook, Malcolm wrote:
Let's distinguish between 2 kinds of generics: (1) functions
in base R (i.e. packages base, stats, graphics, parallel, etc...)
that we want to turn into S4 generics, and (2) other generic
functions introduced in Bioconductor.

The main motivation for the BiocGenerics package was to have a
central place for generics of the 1st kind. The need for a central
place has to do with the need of a clear ownership of the *generic*,
and the notion of ownership is blurred by the fact that base functions
can be implicitly turned into generics by a simple attempt to attach
a method to them (with setMethod).

Now with more details. Any base function foo that is not already
a generic can implicitly be turned into a generic by any package
that contains a setMethod("foo", ...) statement. Even if those
packages are good citizens (by not trying to explicitly turn foo
into a generic with setGeneric(), which would have caused even
more problems), this approach is not satisfying. We still have
the need to explicitly turn foo into a generic in 1 place (with
a single setGeneric statement in the entire Bioconductor world),
for at least 2 reasons:

   (1) The implicit generic always dispatches on all its arguments.
       For many functions, this is not desirable (e.g. what's the
       point to have the Reduce generic dispatch on its 'right'
       argument?)

   (2) When a packageimplicitly turns foo into a generic, it needs
       to export that generic. It also needs to add an alias for the
       *generic* (i.e. \alias{foo}) somewhere in its man pages, in
       addition to the alias for the foo *method*. Otherwise it gets
       an 'R CMD check' warning (kind of fair, the foo generic being
       a new thing that needs to be documented somewhere). So with a
       model where foo is never turned explicitly into a generic by
       any package, we are in a situation where each package that
       contains a setMethod("foo", ...) statement needs to assume that
       this statement will trigger the creation of the implicit generic,
       and therefore needs to assume ownership of that generic (by
       exporting and documenting it). But what will really happen is
       that only one package will effectively get the ownership, and
       it will be the first package to be loaded! Not good.

For those 2 reasons we decided to use a central place (BiocGenerics)
to explicitly turn foo (and other base functions) into a generic.
Now the ownership of the foo generic is known in advance and any
developer that needs to define a method for that generic knows where
to look for that generic (i.e. s/he knows where to import the generic
from and where to find the man page for the generic). In addition,
now we can specify the arguments that are involved in the dispatch.
Just to clarify, BiocGenerics::Reduce does not provide an
implementation at all:

   > BiocGenerics::Reduce
   standardGeneric for "Reduce" defined from package "BiocGenerics"

   function (f, x, init, right = FALSE, accumulate = FALSE) 
standardGeneric("Reduce")
   <environment: 0x20ee0a0>
   Methods may be defined for arguments: x
   Use  showMethods("Reduce")  for currently available ones.

It's just a generic function (i.e. the only thing it does it
dispatching). With only the BiocGenerics package loaded in your
session, there is only 1 method defined for that generic:

   > showMethods("Reduce")
   Function: Reduce (package BiocGenerics)
   f="ANY"

We call this method the "default method", because it's the one that
will be used if no other more specific method is available for the
object passed to it. And this default method is just base::Reduce:

   > selectMethod("Reduce", "ANY")
   Method Definition (Class "derivedDefaultMethod"):

   function (f, x, init, right = FALSE, accumulate = FALSE)
   {
   <SNIP>
   }
   <environment: namespace:base>

   Signatures:
           f
   target  "ANY"
   defined "ANY"

See the environment in which this method is defined? It's defined in
base, which is the proof that this method is really base::Reduce.

Note that, to add to the confusion, there is a bug in how showMethods()
displays the name of the argument used for dispatch: it's 'x', not 'f'.

The only code BiocGenerics contains with respect to Reduce is:

   setGeneric("Reduce", signature="x")

Pretty light isn't it? setGeneric() automatically sets base::Reduce
as the default method.
Yes, because base::Reduce calls base::as.list internally and 
base::as.list doesn't work on a GRangesList object. Note that if
you look at the implementation of base::Reduce, you won't see a
call to base::as.list, only a call to as.list. But base::as.list
is really what is being called, because that's the only as.list
function that exists from within the environment where base::Reduce
is defined, namely the namespace:base environment.
I agree there is definitely room for improving some of the functions
defined in base that conceptually need only basic things like length(),
[, [[ to work on an object x. As long as those things are themselves
generics defined in base. Then I can implement methods for those basic
things, and have suddenly a lot of other things in base that work
out-of-the-box.

For Reduce and as.list though, it seems that *we* could do a better job.
This is because as.list is itself an S3 generic. My understanding
is that if we had an as.list.List method (in addition to the "as.list"
*S4* method for List objects), then base::as.list would work on any
List object (e.g. GRangesList object), and so would any base function
that uses as.list internally (like lapply or Reduce). I definitely
want to take the time to explore that approach, because my feeling is
that we could simplify things significantly by not turning lapply,
Reduce, and a bunch of other things, into S4 generics, and by
dropping a lot of methods (currently defined in IRanges) that we
shouldn't need to have.
Conceptually, we should not need to do that, I agree. At least for
things in base that are defined in terms of sequence access primitives,
as long as those sequence access primitives are S3 generics. Otherwise
we need to make those things generic. Another reason for making
something a generic (even if it works out-of-the-box on any object)
is performance. People might want to implement a specific method for
their objects that improves on the out-of-the-box performance.
I'd say there is no need to fix the funprog functions (if that's what
you mean by Functional). I hope we can make Reduce and family just work
on any List object by adding as.list.List (S3 method) to IRanges.
However, you are right that it feels that we shoudln't even have to put
anything in IRanges for having as.list work on any S4 object for which
length() and [[ defined. Seems like maybe this could be achieved by
modifying as.list.default (defined in base):

Would be something like:

   as.list.default <- function (x, ...)
   {
     if (typeof(x) == "list")
         return(x)
     if (!is.object(x))
         return(.Internal(as.vector(x, "list")))
     lapply(seq_len(length(x)), function(i) x[[i]])
   }

Instead of:

   as.list.default <- function (x, ...)
   {
     if (typeof(x) == "list") x else .Internal(as.vector(x, "list"))
   }

As for parallel, yes, some of the functions in the "snow family" need
to be made generics. If not S4, at least S3 generics.

H.

  
    

Thread (32 messages)

Hahne, Florian parallel package generics Oct 17 Martin Morgan parallel package generics Oct 23 Hahne, Florian parallel package generics Oct 23 Steve Lianoglou parallel package generics Oct 23 Vincent Carey parallel package generics Oct 23 Martin Morgan parallel package generics Oct 23 Vincent Carey parallel package generics Oct 23 Michael Lawrence parallel package generics Oct 23 Cook, Malcolm parallel package generics Oct 24 Hervé Pagès parallel package generics Oct 24 Hahne, Florian parallel package generics Oct 25 Cook, Malcolm parallel package generics Oct 25 Vincent Carey parallel package generics Oct 25 Hahne, Florian parallel package generics Oct 25 Tim Triche, Jr. parallel package generics Oct 25 Hahne, Florian parallel package generics Oct 25 Hahne, Florian parallel package generics Oct 25 Hervé Pagès parallel package generics Oct 25 Cook, Malcolm parallel package generics Oct 25 Martin Morgan as.list.List (was Re: parallel package generics) Oct 25 Martin Morgan Why BiocGenerics (was Re: parallel package generics) Oct 25 Vincent Carey parallel package generics Oct 25 Michael Lawrence parallel package generics Oct 25 Michael Lawrence as.list.List (was Re: parallel package generics) Oct 25 Cook, Malcolm as.list.List (was Re: parallel package generics) Oct 25 Cook, Malcolm parallel package generics Oct 25 Cook, Malcolm Why BiocGenerics (was Re: parallel package generics) Oct 25 Hervé Pagès parallel package generics Oct 25 Cook, Malcolm parallel package generics Oct 25 Hervé Pagès parallel package generics Oct 25 Hahne, Florian parallel package generics Oct 26 Nicolas Delhomme parallel package generics Oct 26