[Bioc-devel] avoiding clashes of different S4 methods with the same generic
On 04/26/2016 04:47 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun <alun at wehi.edu.au> wrote:
Dear List, When a S4 method for the same class is defined in two separate packages (i.e., under the same generic), and both packages are loaded into a R session, it seems that the method from the package loaded later clobbers the method from the package loaded first. Is it possible to specifically call the method in the first package when both packages are loaded? If not, how should we protect against this? To give some context; the csaw package currently defines a normalize() method for SummarizedExperiment objects, using the generic from BiocGenerics. However, if some other hypothetical package (I'll call it "swings", for argument's sake) were to define a normalize() method with a SE signature, and if the swings package were to be loaded after csaw, then it seems that all calls to normalize() would use the method defined by swings, rather than that defined by csaw. Now, for usual functions, disambiguation would be easy with "::", but I don't know whether this can be done in the S4 system, given that the details of dispatch are generally hidden away. The only solution I can see is for csaw (and/or swings) to define a SE subclass; define the normalize() method using the subclass as the signature, such that S4 dispatch will now go to the correct method; and hope that no other package redefines normalize() for the subclass. Is this what I should be doing routinely, i.e., define subclasses and methods for those subclasses in all my packages? Or am I missing something obvious? I would have expected such clashes to be more of a problem, given how many new packages are being added to BioC at every release.
I would recommend against defining subclasses of basic data structures that differ only in their behavior. The purpose of SummarizedExperiment is to store data. One might use inheritance to modify how the data are stored, or to store new types of data, although the latter may be best addressed through composition. To extend behavior, define methods. The generic represents the verb and thus the semantics of the operation. In general, method conflicts indicate that the design is broken. In this case, the normalize() generic has a very general name. There is no one way to "normalize" a SummarizedExperiment. It would be difficult for the reader to understand such ambiguous code. To indicate a specific normalization algorithm, we either need a more specific generic or we need to parameterize it further. One way to make more specific generics would be to give them the same name, "normalize", but define them in different namespaces and require :: qualification. That would mean abandoning the BiocGenerics generic and it would only work if each package provides only one way to normalize. Or, one could give them different names, but it would be difficult to select a natural name, and it's not clear whether the abstract notion of normalization should be always coupled with the method. A more flexible/modular approach would be to augment the signature of BiocGenerics::normalize to indicate a normalization method and rely on dual-dispatch: normalize(se, WithSwings()) normalize(se, WithCSaw()) Roughly, one example of this approach is VariantAnnotation::locateVariants() and its variant type argument.
I like the dual dispatch method quite a bit (but wonder why we get several swings but only one csaw? Maybe a csaw implies two participants [though I think I once in a while csaw-ed alone], so a singular csaw and a pair of swings balance out?), partly because it's very easy to extend (write another method) and the second argument can be either lightweight or parameterized. From a user perspective normalizeCsaw / normalizeSwings makes the available options only a tab key away; maybe that's why Michael suggested With*? Martin
The affy package (or something around it) auto-qualifies the generic via a method argument; something like S3 around S4. For example normalize(se, "swings") would call normalize.swings(se), where normalize.swings itself could be generic. Another way to effect cascading dispatch is through composition, where the method object either is a function or can provide one to implement the normalization (emulating message passing OOP), which would allow normalize() to implemented simply as: normalize <- function(x, method, ...) normalizer(method)(x, ...) One issue is that the syntax is a bit unconventional and users might end up preferring the affy approach, with a normalize_csaw() and normalize_swings(). But I like the modular, dynamic approach outlined above. Thoughts? Michael
Cheers, Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.