[Bioc-devel] avoiding clashes of different S4 methods with the same generic

On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun <alun at wehi.edu.au> wrote:
Dear List,

When a S4 method for the same class is defined in two separate packages
(i.e., under the same generic), and both packages are loaded into a R
session, it seems that the method from the package loaded later clobbers the
method from the package loaded first. Is it possible to specifically call
the method in the first package when both packages are loaded? If not, how
should we protect against this?

To give some context; the csaw package currently defines a normalize()
method for SummarizedExperiment objects, using the generic from
BiocGenerics. However, if some other hypothetical package (I'll call it
"swings", for argument's sake) were to define a normalize() method with a SE
signature, and if the swings package were to be loaded after csaw, then it
seems that all calls to normalize() would use the method defined by swings,
rather than that defined by csaw.

Now, for usual functions, disambiguation would be easy with "::", but I
don't know whether this can be done in the S4 system, given that the details
of dispatch are generally hidden away. The only solution I can see is for
csaw (and/or swings) to define a SE subclass; define the normalize() method
using the subclass as the signature, such that S4 dispatch will now go to
the correct method; and hope that no other package redefines normalize() for
the subclass.

Is this what I should be doing routinely, i.e., define subclasses and
methods for those subclasses in all my packages? Or am I missing something
obvious? I would have expected such clashes to be more of a problem, given
how many new packages are being added to BioC at every release.

I would recommend against defining subclasses of basic data structures
that differ only in their behavior. The purpose of
SummarizedExperiment is to store data. One might use inheritance to
modify how the data are stored, or to store new types of data,
although the latter may be best addressed through composition.

To extend behavior, define methods. The generic represents the verb
and thus the semantics of the operation. In general, method conflicts
indicate that the design is broken. In this case, the normalize()
generic has a very general name. There is no one way to "normalize" a
SummarizedExperiment. It would be difficult for the reader to
understand such ambiguous code. To indicate a specific normalization
algorithm, we either need a more specific generic or we need to
parameterize it further.

One way to make more specific generics would be to give them the same
name, "normalize", but define them in different namespaces and require
:: qualification. That would mean abandoning the BiocGenerics generic
and it would only work if each package provides only one way to
normalize. Or, one could give them different names, but it would be
difficult to select a natural name, and it's not clear whether the
abstract notion of normalization should be always coupled with the
method.

A more flexible/modular approach would be to augment the signature of
BiocGenerics::normalize to indicate a normalization method and rely on
dual-dispatch:

normalize(se, WithSwings())
normalize(se, WithCSaw())

Roughly, one example of this approach is
VariantAnnotation::locateVariants() and its variant type argument.
I like the dual dispatch method quite a bit (but wonder why we get 
several swings but only one csaw? Maybe a csaw implies two participants 
[though I think I once in a while csaw-ed alone], so a singular csaw and 
a pair of swings balance out?), partly because it's very easy to extend 
(write another method) and the second argument can be either lightweight 
or parameterized.

 From a user perspective normalizeCsaw / normalizeSwings makes the 
available options only a tab key away; maybe that's why Michael 
suggested With*?

Martin
The affy package (or something around it) auto-qualifies the generic
via a method argument; something like S3 around S4. For example
normalize(se, "swings") would call normalize.swings(se), where
normalize.swings itself could be generic. Another way to effect
cascading dispatch is through composition, where the method object
either is a function or can provide one to implement the normalization
(emulating message passing OOP), which would allow normalize() to
implemented simply as:

normalize <- function(x, method, ...) normalizer(method)(x, ...)

One issue is that the syntax is a bit unconventional and users might
end up preferring the affy approach, with a normalize_csaw() and
normalize_swings(). But I like the modular, dynamic approach outlined
above.

Thoughts?

Michael

Cheers,

Aaron

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.

[Bioc-devel] avoiding clashes of different S4 methods with the same generic

Thread (14 messages)