Dear List, When a S4 method for the same class is defined in two separate packages (i.e., under the same generic), and both packages are loaded into a R session, it seems that the method from the package loaded later clobbers the method from the package loaded first. Is it possible to specifically call the method in the first package when both packages are loaded? If not, how should we protect against this? To give some context; the csaw package currently defines a normalize() method for SummarizedExperiment objects, using the generic from BiocGenerics. However, if some other hypothetical package (I'll call it "swings", for argument's sake) were to define a normalize() method with a SE signature, and if the swings package were to be loaded after csaw, then it seems that all calls to normalize() would use the method defined by swings, rather than that defined by csaw. Now, for usual functions, disambiguation would be easy with "::", but I don't know whether this can be done in the S4 system, given that the details of dispatch are generally hidden away. The only solution I can see is for csaw (and/or swings) to define a SE subclass; define the normalize() method using the subclass as the signature, such that S4 dispatch will now go to the correct method; and hope that no other package redefines normalize() for the subclass. Is this what I should be doing routinely, i.e., define subclasses and methods for those subclasses in all my packages? Or am I missing something obvious? I would have expected such clashes to be more of a problem, given how many new packages are being added to BioC at every release. Cheers, Aaron
[Bioc-devel] avoiding clashes of different S4 methods with the same generic
14 messages · Aaron Lun, Michael Lawrence, Martin Morgan +2 more
On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun <alun at wehi.edu.au> wrote:
Dear List, When a S4 method for the same class is defined in two separate packages (i.e., under the same generic), and both packages are loaded into a R session, it seems that the method from the package loaded later clobbers the method from the package loaded first. Is it possible to specifically call the method in the first package when both packages are loaded? If not, how should we protect against this? To give some context; the csaw package currently defines a normalize() method for SummarizedExperiment objects, using the generic from BiocGenerics. However, if some other hypothetical package (I'll call it "swings", for argument's sake) were to define a normalize() method with a SE signature, and if the swings package were to be loaded after csaw, then it seems that all calls to normalize() would use the method defined by swings, rather than that defined by csaw. Now, for usual functions, disambiguation would be easy with "::", but I don't know whether this can be done in the S4 system, given that the details of dispatch are generally hidden away. The only solution I can see is for csaw (and/or swings) to define a SE subclass; define the normalize() method using the subclass as the signature, such that S4 dispatch will now go to the correct method; and hope that no other package redefines normalize() for the subclass. Is this what I should be doing routinely, i.e., define subclasses and methods for those subclasses in all my packages? Or am I missing something obvious? I would have expected such clashes to be more of a problem, given how many new packages are being added to BioC at every release.
I would recommend against defining subclasses of basic data structures that differ only in their behavior. The purpose of SummarizedExperiment is to store data. One might use inheritance to modify how the data are stored, or to store new types of data, although the latter may be best addressed through composition. To extend behavior, define methods. The generic represents the verb and thus the semantics of the operation. In general, method conflicts indicate that the design is broken. In this case, the normalize() generic has a very general name. There is no one way to "normalize" a SummarizedExperiment. It would be difficult for the reader to understand such ambiguous code. To indicate a specific normalization algorithm, we either need a more specific generic or we need to parameterize it further. One way to make more specific generics would be to give them the same name, "normalize", but define them in different namespaces and require :: qualification. That would mean abandoning the BiocGenerics generic and it would only work if each package provides only one way to normalize. Or, one could give them different names, but it would be difficult to select a natural name, and it's not clear whether the abstract notion of normalization should be always coupled with the method. A more flexible/modular approach would be to augment the signature of BiocGenerics::normalize to indicate a normalization method and rely on dual-dispatch: normalize(se, WithSwings()) normalize(se, WithCSaw()) Roughly, one example of this approach is VariantAnnotation::locateVariants() and its variant type argument. The affy package (or something around it) auto-qualifies the generic via a method argument; something like S3 around S4. For example normalize(se, "swings") would call normalize.swings(se), where normalize.swings itself could be generic. Another way to effect cascading dispatch is through composition, where the method object either is a function or can provide one to implement the normalization (emulating message passing OOP), which would allow normalize() to implemented simply as: normalize <- function(x, method, ...) normalizer(method)(x, ...) One issue is that the syntax is a bit unconventional and users might end up preferring the affy approach, with a normalize_csaw() and normalize_swings(). But I like the modular, dynamic approach outlined above. Thoughts? Michael
Cheers, Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On 04/26/2016 04:47 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun <alun at wehi.edu.au> wrote:
Dear List, When a S4 method for the same class is defined in two separate packages (i.e., under the same generic), and both packages are loaded into a R session, it seems that the method from the package loaded later clobbers the method from the package loaded first. Is it possible to specifically call the method in the first package when both packages are loaded? If not, how should we protect against this? To give some context; the csaw package currently defines a normalize() method for SummarizedExperiment objects, using the generic from BiocGenerics. However, if some other hypothetical package (I'll call it "swings", for argument's sake) were to define a normalize() method with a SE signature, and if the swings package were to be loaded after csaw, then it seems that all calls to normalize() would use the method defined by swings, rather than that defined by csaw. Now, for usual functions, disambiguation would be easy with "::", but I don't know whether this can be done in the S4 system, given that the details of dispatch are generally hidden away. The only solution I can see is for csaw (and/or swings) to define a SE subclass; define the normalize() method using the subclass as the signature, such that S4 dispatch will now go to the correct method; and hope that no other package redefines normalize() for the subclass. Is this what I should be doing routinely, i.e., define subclasses and methods for those subclasses in all my packages? Or am I missing something obvious? I would have expected such clashes to be more of a problem, given how many new packages are being added to BioC at every release.
I would recommend against defining subclasses of basic data structures that differ only in their behavior. The purpose of SummarizedExperiment is to store data. One might use inheritance to modify how the data are stored, or to store new types of data, although the latter may be best addressed through composition. To extend behavior, define methods. The generic represents the verb and thus the semantics of the operation. In general, method conflicts indicate that the design is broken. In this case, the normalize() generic has a very general name. There is no one way to "normalize" a SummarizedExperiment. It would be difficult for the reader to understand such ambiguous code. To indicate a specific normalization algorithm, we either need a more specific generic or we need to parameterize it further. One way to make more specific generics would be to give them the same name, "normalize", but define them in different namespaces and require :: qualification. That would mean abandoning the BiocGenerics generic and it would only work if each package provides only one way to normalize. Or, one could give them different names, but it would be difficult to select a natural name, and it's not clear whether the abstract notion of normalization should be always coupled with the method. A more flexible/modular approach would be to augment the signature of BiocGenerics::normalize to indicate a normalization method and rely on dual-dispatch: normalize(se, WithSwings()) normalize(se, WithCSaw()) Roughly, one example of this approach is VariantAnnotation::locateVariants() and its variant type argument.
I like the dual dispatch method quite a bit (but wonder why we get several swings but only one csaw? Maybe a csaw implies two participants [though I think I once in a while csaw-ed alone], so a singular csaw and a pair of swings balance out?), partly because it's very easy to extend (write another method) and the second argument can be either lightweight or parameterized. From a user perspective normalizeCsaw / normalizeSwings makes the available options only a tab key away; maybe that's why Michael suggested With*? Martin
The affy package (or something around it) auto-qualifies the generic via a method argument; something like S3 around S4. For example normalize(se, "swings") would call normalize.swings(se), where normalize.swings itself could be generic. Another way to effect cascading dispatch is through composition, where the method object either is a function or can provide one to implement the normalization (emulating message passing OOP), which would allow normalize() to implemented simply as: normalize <- function(x, method, ...) normalizer(method)(x, ...) One issue is that the syntax is a bit unconventional and users might end up preferring the affy approach, with a normalize_csaw() and normalize_swings(). But I like the modular, dynamic approach outlined above. Thoughts? Michael
Cheers, Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
On Tue, Apr 26, 2016 at 2:16 PM, Martin Morgan
<martin.morgan at roswellpark.org> wrote:
On 04/26/2016 04:47 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun <alun at wehi.edu.au> wrote:
Dear List, When a S4 method for the same class is defined in two separate packages (i.e., under the same generic), and both packages are loaded into a R session, it seems that the method from the package loaded later clobbers the method from the package loaded first. Is it possible to specifically call the method in the first package when both packages are loaded? If not, how should we protect against this? To give some context; the csaw package currently defines a normalize() method for SummarizedExperiment objects, using the generic from BiocGenerics. However, if some other hypothetical package (I'll call it "swings", for argument's sake) were to define a normalize() method with a SE signature, and if the swings package were to be loaded after csaw, then it seems that all calls to normalize() would use the method defined by swings, rather than that defined by csaw. Now, for usual functions, disambiguation would be easy with "::", but I don't know whether this can be done in the S4 system, given that the details of dispatch are generally hidden away. The only solution I can see is for csaw (and/or swings) to define a SE subclass; define the normalize() method using the subclass as the signature, such that S4 dispatch will now go to the correct method; and hope that no other package redefines normalize() for the subclass. Is this what I should be doing routinely, i.e., define subclasses and methods for those subclasses in all my packages? Or am I missing something obvious? I would have expected such clashes to be more of a problem, given how many new packages are being added to BioC at every release.
I would recommend against defining subclasses of basic data structures that differ only in their behavior. The purpose of SummarizedExperiment is to store data. One might use inheritance to modify how the data are stored, or to store new types of data, although the latter may be best addressed through composition. To extend behavior, define methods. The generic represents the verb and thus the semantics of the operation. In general, method conflicts indicate that the design is broken. In this case, the normalize() generic has a very general name. There is no one way to "normalize" a SummarizedExperiment. It would be difficult for the reader to understand such ambiguous code. To indicate a specific normalization algorithm, we either need a more specific generic or we need to parameterize it further. One way to make more specific generics would be to give them the same name, "normalize", but define them in different namespaces and require :: qualification. That would mean abandoning the BiocGenerics generic and it would only work if each package provides only one way to normalize. Or, one could give them different names, but it would be difficult to select a natural name, and it's not clear whether the abstract notion of normalization should be always coupled with the method. A more flexible/modular approach would be to augment the signature of BiocGenerics::normalize to indicate a normalization method and rely on dual-dispatch: normalize(se, WithSwings()) normalize(se, WithCSaw()) Roughly, one example of this approach is VariantAnnotation::locateVariants() and its variant type argument.
I like the dual dispatch method quite a bit (but wonder why we get several swings but only one csaw? Maybe a csaw implies two participants [though I think I once in a while csaw-ed alone], so a singular csaw and a pair of swings balance out?), partly because it's very easy to extend (write another method) and the second argument can be either lightweight or parameterized.
I could go along with the dual dispatch. "Swings" is short for "Set of swings". Usually, there are several swings in a row, but only one see-saw.
From a user perspective normalizeCsaw / normalizeSwings makes the available options only a tab key away; maybe that's why Michael suggested With*?
That's a good point.
Martin
The affy package (or something around it) auto-qualifies the generic via a method argument; something like S3 around S4. For example normalize(se, "swings") would call normalize.swings(se), where normalize.swings itself could be generic. Another way to effect cascading dispatch is through composition, where the method object either is a function or can provide one to implement the normalization (emulating message passing OOP), which would allow normalize() to implemented simply as: normalize <- function(x, method, ...) normalizer(method)(x, ...) One issue is that the syntax is a bit unconventional and users might end up preferring the affy approach, with a normalize_csaw() and normalize_swings(). But I like the modular, dynamic approach outlined above. Thoughts? Michael
Cheers, Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
On 04/26/2016 05:28 PM, Michael Lawrence wrote:
> On Tue, Apr 26, 2016 at 2:16 PM, Martin Morgan
> <martin.morgan at roswellpark.org> wrote:
>> >
>> >On 04/26/2016 04:47 PM, Michael Lawrence wrote:
>>> >>
>>> >>On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun<alun at wehi.edu.au> wrote:
... >>>> >>>BiocGenerics. However, if some other hypothetical package (I'll call it >>>> >>>"swings", for argument's sake) were to define a normalize() method with a ... >> >I like the dual dispatch method quite a bit (but wonder why we get several
swings but only one csaw? Maybe a csaw implies two participants [though I think I once in a while csaw-ed alone], so a singular csaw and a pair of swings balance out?), partly because it's very easy to extend (write another method) and the second argument can be either lightweight or parameterized.
I could go along with the dual dispatch. "Swings" is short for "Set of swings". Usually, there are several swings in a row, but only one see-saw.
Googling for "how many swings per see-saw" took me to https://www.cpsc.gov//PageFiles/108601/playgrnd.pdf where it is apparent that swings are much more dangerous than see-saws (e.g., 51 matches for "swing" versus 4 for "see-saw"; "Swings ... were involved in about 19 ... percent of injuries ... See-saws accounted for about three percent"; "Homemade rope, tire, or tree swings were also involved in a number of hanging deaths" [no mention of death by see-saw]). I think for the sake of our users, especially our younger users, we do not want to consider swings, or even methods on swings, further. Martin This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
Yes, but "monkeyBars" doesn't have quite the same pithiness for a package name. Anyway, the dual dispatch mechanism sounds most interesting. I assume that means we'd have to define some sort of base "normalizeParam" class, and then derive "csawNormParam" and "swingsNormParam" subclasses, so that specific methods can be defined for each signature. - Aaron
Martin Morgan wrote:
On 04/26/2016 05:28 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 2:16 PM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/26/2016 04:47 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun<alun at wehi.edu.au>
wrote: ...
BiocGenerics. However, if some other hypothetical package
(I'll call it
"swings", for argument's sake) were to define a normalize()
method with a ...
I like the dual dispatch method quite a bit (but wonder why we get
several
swings but only one csaw? Maybe a csaw implies two participants
[though I
think I once in a while csaw-ed alone], so a singular csaw and a
pair of
swings balance out?), partly because it's very easy to extend
(write another
method) and the second argument can be either lightweight or
parameterized.
I could go along with the dual dispatch. "Swings" is short for "Set of swings". Usually, there are several swings in a row, but only one see-saw.
Googling for "how many swings per see-saw" took me to https://www.cpsc.gov//PageFiles/108601/playgrnd.pdf where it is apparent that swings are much more dangerous than see-saws (e.g., 51 matches for "swing" versus 4 for "see-saw"; "Swings ... were involved in about 19 ... percent of injuries ... See-saws accounted for about three percent"; "Homemade rope, tire, or tree swings were also involved in a number of hanging deaths" [no mention of death by see-saw]). I think for the sake of our users, especially our younger users, we do not want to consider swings, or even methods on swings, further. Martin This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
Hi, I would not discard defining a SummarizedExperiment subclass so quickly. SummarizedExperiment is very generic and can contain any kind of data. IIUC the csaw package uses SummarizedExperiment to store a particular kind of data (ChIP-seq data) and I believe specialization is a legitimate situation for defining a subclass, even if the subclass is a "straight" subclass i.e. a subclass that doesn't add new slots or doesn't touch the existing slots. OTOH introducing a "straight" subclass only to define one specialized method on it (the "normalize" method in this case) might not be worth it since there is a cost for such class, even if that cost is minimal: a cost for the user (one new container/constructor to deal with) and a cost for the developer (e.g. multiplication of coerce methods). Changing the signature of the normalize() generic in BiocGenerics and introducing dual dispath is of course doable but that means the maintainers of the packages that define methods on this generic are ok with the dual dispatch game and are willing to make the required modifications to their packages. It's an important change and I don't see an easy way to make it happen smoothly (i.e. thru a deprecated/defunct cycle). Here is the list of packages that currently define methods for BiocGenerics::normalize(): affyPLM Cardinal codelink CopyNumber450k csaw diffHic EBImage epigenomix MSnbase oligo qpcrNorm scran [Interestingly the scran package defines a default "normalize" method (i.e. a normalize,ANY method)]. Whether we make the second argument lightweight or parameterized (which is something that would need to be decided at the level of the generic) these packages will break as soon as we change the signature of the generic. So we'll need to wait after the release before this happens. Personally I find the lightweight second argument not particularly intuitive, elegant, or user-friendly. I'd rather type normalizeSwing(se, ...) or normalize(se, SwingParam(...)) than normalize(se, WithSwing(), ...). Last thing: In case of a parameterized second argument, do we really need a virtual normalizeParam class as parent of all the concrete normalizeParam* classes? If so then I guess we would need to have it defined in BiocGenerics but I think we should try hard to not start defining classes in this package (that could take us too far...) H.
On 04/26/2016 03:03 PM, Aaron Lun wrote:
Yes, but "monkeyBars" doesn't have quite the same pithiness for a package name. Anyway, the dual dispatch mechanism sounds most interesting. I assume that means we'd have to define some sort of base "normalizeParam" class, and then derive "csawNormParam" and "swingsNormParam" subclasses, so that specific methods can be defined for each signature. - Aaron Martin Morgan wrote:
On 04/26/2016 05:28 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 2:16 PM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/26/2016 04:47 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun<alun at wehi.edu.au>
wrote: ...
BiocGenerics. However, if some other hypothetical package
(I'll call it
"swings", for argument's sake) were to define a normalize()
method with a ...
I like the dual dispatch method quite a bit (but wonder why we get
several
swings but only one csaw? Maybe a csaw implies two participants
[though I
think I once in a while csaw-ed alone], so a singular csaw and a
pair of
swings balance out?), partly because it's very easy to extend
(write another
method) and the second argument can be either lightweight or
parameterized.
I could go along with the dual dispatch. "Swings" is short for "Set of swings". Usually, there are several swings in a row, but only one see-saw.
Googling for "how many swings per see-saw" took me to https://www.cpsc.gov//PageFiles/108601/playgrnd.pdf where it is apparent that swings are much more dangerous than see-saws (e.g., 51 matches for "swing" versus 4 for "see-saw"; "Swings ... were involved in about 19 ... percent of injuries ... See-saws accounted for about three percent"; "Homemade rope, tire, or tree swings were also involved in a number of hanging deaths" [no mention of death by see-saw]). I think for the sake of our users, especially our younger users, we do not want to consider swings, or even methods on swings, further. Martin This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
OTOH introducing a "straight" subclass only to define one specialized method on it (the "normalize" method in this case) might not be worth it since there is a cost for such class, even if that cost is minimal: a cost for the user (one new container/constructor to deal with) and a cost for the developer (e.g. multiplication of coerce methods).
On the plus side, a straight subclass would make it easier to solve the same problem for other generics. For example, let's say that csaw and swings both defined the estimateDispersions() method for the SE class. If we decided to use dual dispatch, then the estimateDispersions() generic would also have to be changed to have an additional argument. This would not be necessary if the two packages set up their own straight subclasses and defined S4 methods for each subclass. It is slightly annoying, though, to have to define something like a csawDataSet class and its constructor, even if the constructor is just a trivial wrapper around the SE constructor. Ah well.
[Interestingly the scran package defines a default "normalize" method (i.e. a normalize,ANY method)].
Yes - uh - I got too excited with S4 methods, and things got out of hand. I've been pruning out some of the sillier definitions. - Aaron
On 04/26/2016 03:03 PM, Aaron Lun wrote:
Yes, but "monkeyBars" doesn't have quite the same pithiness for a package name. Anyway, the dual dispatch mechanism sounds most interesting. I assume that means we'd have to define some sort of base "normalizeParam" class, and then derive "csawNormParam" and "swingsNormParam" subclasses, so that specific methods can be defined for each signature. - Aaron Martin Morgan wrote:
On 04/26/2016 05:28 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 2:16 PM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/26/2016 04:47 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun<alun at wehi.edu.au>
wrote: ...
BiocGenerics. However, if some other hypothetical package
(I'll call it
"swings", for argument's sake) were to define a normalize()
method with a ...
I like the dual dispatch method quite a bit (but wonder why we get
several
swings but only one csaw? Maybe a csaw implies two participants
[though I
think I once in a while csaw-ed alone], so a singular csaw and a
pair of
swings balance out?), partly because it's very easy to extend
(write another
method) and the second argument can be either lightweight or
parameterized.
I could go along with the dual dispatch. "Swings" is short for "Set of swings". Usually, there are several swings in a row, but only one see-saw.
Googling for "how many swings per see-saw" took me to https://www.cpsc.gov//PageFiles/108601/playgrnd.pdf where it is apparent that swings are much more dangerous than see-saws (e.g., 51 matches for "swing" versus 4 for "see-saw"; "Swings ... were involved in about 19 ... percent of injuries ... See-saws accounted for about three percent"; "Homemade rope, tire, or tree swings were also involved in a number of hanging deaths" [no mention of death by see-saw]). I think for the sake of our users, especially our younger users, we do not want to consider swings, or even methods on swings, further. Martin This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On 04/27/2016 02:12 AM, Herv? Pag?s wrote:
Hi, I would not discard defining a SummarizedExperiment subclass so quickly. SummarizedExperiment is very generic and can contain any kind of data. IIUC the csaw package uses SummarizedExperiment to store a particular kind of data (ChIP-seq data) and I believe specialization is a legitimate situation for defining a subclass, even if the subclass is a "straight" subclass i.e. a subclass that doesn't add new slots or doesn't touch the existing slots. OTOH introducing a "straight" subclass only to define one specialized method on it (the "normalize" method in this case) might not be worth it since there is a cost for such class, even if that cost is minimal: a cost for the user (one new container/constructor to deal with) and a cost for the developer (e.g. multiplication of coerce methods). Changing the signature of the normalize() generic in BiocGenerics and introducing dual dispath is of course doable but that means the maintainers of the packages that define methods on this generic are ok with the dual dispatch game and are willing to make the required modifications to their packages. It's an important change and I don't see an easy way to make it happen smoothly (i.e. thru a deprecated/defunct cycle). Here is the list of packages that currently define methods for BiocGenerics::normalize(): affyPLM Cardinal codelink CopyNumber450k csaw diffHic EBImage epigenomix MSnbase oligo qpcrNorm scran [Interestingly the scran package defines a default "normalize" method (i.e. a normalize,ANY method)]. Whether we make the second argument lightweight or parameterized (which is something that would need to be decided at the level of the generic)
By 'parameterized' I meant that the second argument could have slots or not; I'm not sure whether the use was clear. I don't think the generic needs to know about parameterization (also, I agree below that there is no need for a base class).
these packages will break as soon as we change the signature of the generic. So we'll need to wait after the release before this happens. Personally I find the lightweight second argument not particularly intuitive, elegant, or user-friendly. I'd rather type normalizeSwing(se, ...) or normalize(se, SwingParam(...)) than normalize(se, WithSwing(), ...). Last thing: In case of a parameterized second argument, do we really need a virtual normalizeParam class as parent of all the concrete normalizeParam* classes? If so then I guess we would need to have it defined in BiocGenerics but I think we should try hard to not start defining classes in this package (that could take us too far...)
I agree, I don't think a base class or decisions about parameterization
would be necessary -- create a MerryGoRoundParam() (parameterized with
horses or not; much safer than swings!) and the
normalize,SummarizedExperiment,MerryGoRoundParam-method and you're done.
Also wouldn't dispatch with second arg 'missing' or ANY recover current
behavior? Maybe it would need to be named and at the end to be
implemented without others buying in.
setGeneric("foo", function(x, y, ...) standardGeneric("foo"))
## setGeneric("foo", function(x, ..., y) standardGeneric("foo"))
.A <- setClass("A", representation(x="numeric"))
setMethod("foo", "A", function(x, y, ...) "ImanA")
> foo(.A())
[1] "ImanA"
For the normalizeSlides etc approach... I agree that it is probably
easier for users, though maybe many users are following a vignette so
the benefit is relatively minimal; there is definitely value in using a
common prefix or suffix (normalize* or *Param()).
Here's what limma does
> normalize<tab>
normalizeBetweenArrays normalizeQuantiles
normalizeCyclicLoess normalizeRobustSpline
normalizeForPrintorder normalizeVSN
normalizeForPrintorder.rg normalizeVSN.default
normalizeMedianAbsValues normalizeVSN.EListRaw
normalizeMedianValues normalizeVSN.RGList
normalizePath normalizeWithinArrays
and note the complexity in normalizeVSN, where we're back at desiring a
generic! If one wanted to implement
normalizeQuantiles,SummarizedExperiment-method then it would be back to
negotiating with individual package authors to introduce a new BiocGenerics.
Martin
H. On 04/26/2016 03:03 PM, Aaron Lun wrote:
Yes, but "monkeyBars" doesn't have quite the same pithiness for a package name. Anyway, the dual dispatch mechanism sounds most interesting. I assume that means we'd have to define some sort of base "normalizeParam" class, and then derive "csawNormParam" and "swingsNormParam" subclasses, so that specific methods can be defined for each signature. - Aaron Martin Morgan wrote:
On 04/26/2016 05:28 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 2:16 PM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/26/2016 04:47 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun<alun at wehi.edu.au>
wrote: ...
BiocGenerics. However, if some other hypothetical package
(I'll call it
"swings", for argument's sake) were to define a normalize()
method with a ...
I like the dual dispatch method quite a bit (but wonder why we get
several
swings but only one csaw? Maybe a csaw implies two participants
[though I
think I once in a while csaw-ed alone], so a singular csaw and a
pair of
swings balance out?), partly because it's very easy to extend
(write another
method) and the second argument can be either lightweight or
parameterized.
I could go along with the dual dispatch. "Swings" is short for "Set of swings". Usually, there are several swings in a row, but only one see-saw.
Googling for "how many swings per see-saw" took me to https://www.cpsc.gov//PageFiles/108601/playgrnd.pdf where it is apparent that swings are much more dangerous than see-saws (e.g., 51 matches for "swing" versus 4 for "see-saw"; "Swings ... were involved in about 19 ... percent of injuries ... See-saws accounted for about three percent"; "Homemade rope, tire, or tree swings were also involved in a number of hanging deaths" [no mention of death by see-saw]). I think for the sake of our users, especially our younger users, we do not want to consider swings, or even methods on swings, further. Martin This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
On Tue, Apr 26, 2016 at 11:12 PM, Herv? Pag?s <hpages at fredhutch.org> wrote:
Hi, I would not discard defining a SummarizedExperiment subclass so quickly. SummarizedExperiment is very generic and can contain any kind of data. IIUC the csaw package uses SummarizedExperiment to store a particular kind of data (ChIP-seq data) and I believe specialization is a legitimate situation for defining a subclass, even if the subclass is a "straight" subclass i.e. a subclass that doesn't add new slots or doesn't touch the existing slots. OTOH introducing a "straight" subclass only to define one specialized method on it (the "normalize" method in this case) might not be worth it since there is a cost for such class, even if that cost is minimal: a cost for the user (one new container/constructor to deal with) and a cost for the developer (e.g. multiplication of coerce methods).
If the data are more specialized, specialize the data structure, but the fact that the specialization solves the normalize() ambiguity is a mere coincidence. There are two different concerns.
Changing the signature of the normalize() generic in BiocGenerics and introducing dual dispath is of course doable but that means the maintainers of the packages that define methods on this generic are ok with the dual dispatch game and are willing to make the required modifications to their packages. It's an important change and I don't see an easy way to make it happen smoothly (i.e. thru a deprecated/defunct cycle).
In conjunction with what Martin said, you could define a "ANY","missing" method that emits a deprecation warning, and then recall the generic using NULL or something for the second argument so that it falls through. Packages would only need to fix the formals of their method definition.
Here is the list of packages that currently define methods for BiocGenerics::normalize(): affyPLM Cardinal codelink CopyNumber450k csaw diffHic EBImage epigenomix MSnbase oligo qpcrNorm scran [Interestingly the scran package defines a default "normalize" method (i.e. a normalize,ANY method)]. Whether we make the second argument lightweight or parameterized (which is something that would need to be decided at the level of the generic) these packages will break as soon as we change the signature of the generic. So we'll need to wait after the release before this happens. Personally I find the lightweight second argument not particularly intuitive, elegant, or user-friendly. I'd rather type normalizeSwing(se, ...) or normalize(se, SwingParam(...)) than normalize(se, WithSwing(), ...).
Sure, WithSwing() could hold arguments as well, but I agree that the Param suffix is more consistent. The Param naming is not great for autocompletion. Though I guess the interface could provide hints based on the defined methods.
Last thing: In case of a parameterized second argument, do we really need a virtual normalizeParam class as parent of all the concrete normalizeParam* classes? If so then I guess we would need to have it defined in BiocGenerics but I think we should try hard to not start defining classes in this package (that could take us too far...)
I would say no, no real need for a base class.
H. On 04/26/2016 03:03 PM, Aaron Lun wrote:
Yes, but "monkeyBars" doesn't have quite the same pithiness for a package name. Anyway, the dual dispatch mechanism sounds most interesting. I assume that means we'd have to define some sort of base "normalizeParam" class, and then derive "csawNormParam" and "swingsNormParam" subclasses, so that specific methods can be defined for each signature. - Aaron Martin Morgan wrote:
On 04/26/2016 05:28 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 2:16 PM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/26/2016 04:47 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun<alun at wehi.edu.au>
wrote: ...
BiocGenerics. However, if some other hypothetical package
(I'll call it
"swings", for argument's sake) were to define a normalize()
method with a ...
I like the dual dispatch method quite a bit (but wonder why we get
several
swings but only one csaw? Maybe a csaw implies two participants
[though I
think I once in a while csaw-ed alone], so a singular csaw and a
pair of
swings balance out?), partly because it's very easy to extend
(write another
method) and the second argument can be either lightweight or
parameterized.
I could go along with the dual dispatch. "Swings" is short for "Set of swings". Usually, there are several swings in a row, but only one see-saw.
Googling for "how many swings per see-saw" took me to https://www.cpsc.gov//PageFiles/108601/playgrnd.pdf where it is apparent that swings are much more dangerous than see-saws (e.g., 51 matches for "swing" versus 4 for "see-saw"; "Swings ... were involved in about 19 ... percent of injuries ... See-saws accounted for about three percent"; "Homemade rope, tire, or tree swings were also involved in a number of hanging deaths" [no mention of death by see-saw]). I think for the sake of our users, especially our younger users, we do not want to consider swings, or even methods on swings, further. Martin This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
I agree with Herve that making a subclass (is that the technical term?) might be a good idea. This is what I would do. And this is what has been done historically for all the extensions of ExpressionSet and eSet from Biobase. The issue with a general class like SummarizedExperiment is that methods defined on that class really ought to make sense for any instance of the class, and since the class can contain basically any form of data, I don't see how one can define any kind of informed data processing. For example, SummarizedExperiment can contain any number of different assays and what happens with all of those when you call normalize(). By making a subclass you're can say that this is an object which contains a very specific form of data (which can cue checked by extending the validity check), and you can now start to think about data processing, while still benefitting from the general infrastructure. But I would also say that normalize() is a terrible choice for a method since it is basically maximally uninformative about what happens. Best, Kasper On Wed, Apr 27, 2016 at 7:24 AM, Michael Lawrence <lawrence.michael at gene.com
wrote:
On Tue, Apr 26, 2016 at 11:12 PM, Herv? Pag?s <hpages at fredhutch.org> wrote:
Hi, I would not discard defining a SummarizedExperiment subclass so quickly. SummarizedExperiment is very generic and can contain any kind of data. IIUC the csaw package uses SummarizedExperiment to store a particular kind of data (ChIP-seq data) and I believe specialization is a legitimate situation for defining a subclass, even if the subclass is a "straight" subclass i.e. a subclass that doesn't add new slots or doesn't touch the existing slots. OTOH introducing a "straight" subclass only to define one specialized method on it (the "normalize" method in this case) might not be worth it since there is a cost for such class, even if that cost is minimal: a cost for the user (one new container/constructor to deal with) and a cost for the developer (e.g. multiplication of coerce methods).
If the data are more specialized, specialize the data structure, but the fact that the specialization solves the normalize() ambiguity is a mere coincidence. There are two different concerns.
Changing the signature of the normalize() generic in BiocGenerics and introducing dual dispath is of course doable but that means the maintainers of the packages that define methods on this generic are ok with the dual dispatch game and are willing to make the required modifications to their packages. It's an important change and I don't see an easy way to make it happen smoothly (i.e. thru a deprecated/defunct cycle).
In conjunction with what Martin said, you could define a "ANY","missing" method that emits a deprecation warning, and then recall the generic using NULL or something for the second argument so that it falls through. Packages would only need to fix the formals of their method definition.
Here is the list of packages that currently define methods for BiocGenerics::normalize(): affyPLM Cardinal codelink CopyNumber450k csaw diffHic EBImage epigenomix MSnbase oligo qpcrNorm scran [Interestingly the scran package defines a default "normalize" method (i.e. a normalize,ANY method)]. Whether we make the second argument lightweight or parameterized (which is something that would need to be decided at the level of the generic) these packages will break as soon as we change the signature of the generic. So we'll need to wait after the release before this happens. Personally I find the lightweight second argument not particularly intuitive, elegant, or user-friendly. I'd rather type normalizeSwing(se, ...) or normalize(se, SwingParam(...)) than normalize(se, WithSwing(), ...).
Sure, WithSwing() could hold arguments as well, but I agree that the Param suffix is more consistent. The Param naming is not great for autocompletion. Though I guess the interface could provide hints based on the defined methods.
Last thing: In case of a parameterized second argument, do we really need a virtual normalizeParam class as parent of all the concrete normalizeParam* classes? If so then I guess we would need to have it defined in BiocGenerics but I think we should try hard to not start defining classes in this package (that could take us too far...)
I would say no, no real need for a base class.
H. On 04/26/2016 03:03 PM, Aaron Lun wrote:
Yes, but "monkeyBars" doesn't have quite the same pithiness for a package name. Anyway, the dual dispatch mechanism sounds most interesting. I assume that means we'd have to define some sort of base "normalizeParam" class, and then derive "csawNormParam" and "swingsNormParam" subclasses, so that specific methods can be defined for each signature. - Aaron Martin Morgan wrote:
On 04/26/2016 05:28 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 2:16 PM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/26/2016 04:47 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun<alun at wehi.edu.au>
wrote: ...
BiocGenerics. However, if some other hypothetical package
(I'll call it
"swings", for argument's sake) were to define a normalize()
method with a ...
I like the dual dispatch method quite a bit (but wonder why we get
several
swings but only one csaw? Maybe a csaw implies two participants
[though I
think I once in a while csaw-ed alone], so a singular csaw and a
pair of
swings balance out?), partly because it's very easy to extend
(write another
method) and the second argument can be either lightweight or
parameterized.
I could go along with the dual dispatch. "Swings" is short for "Set of swings". Usually, there are several swings in a row, but only one see-saw.
Googling for "how many swings per see-saw" took me to https://www.cpsc.gov//PageFiles/108601/playgrnd.pdf where it is apparent that swings are much more dangerous than see-saws (e.g., 51 matches for "swing" versus 4 for "see-saw"; "Swings ... were involved in about 19 ... percent of injuries ... See-saws accounted for about three percent"; "Homemade rope, tire, or tree swings were also involved in a number of hanging deaths" [no mention of death by see-saw]). I think for the sake of our users, especially our younger users, we do not want to consider swings, or even methods on swings, further. Martin This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On 04/27/2016 03:18 AM, Martin Morgan wrote:
On 04/27/2016 02:12 AM, Herv? Pag?s wrote:
Hi, I would not discard defining a SummarizedExperiment subclass so quickly. SummarizedExperiment is very generic and can contain any kind of data. IIUC the csaw package uses SummarizedExperiment to store a particular kind of data (ChIP-seq data) and I believe specialization is a legitimate situation for defining a subclass, even if the subclass is a "straight" subclass i.e. a subclass that doesn't add new slots or doesn't touch the existing slots. OTOH introducing a "straight" subclass only to define one specialized method on it (the "normalize" method in this case) might not be worth it since there is a cost for such class, even if that cost is minimal: a cost for the user (one new container/constructor to deal with) and a cost for the developer (e.g. multiplication of coerce methods). Changing the signature of the normalize() generic in BiocGenerics and introducing dual dispath is of course doable but that means the maintainers of the packages that define methods on this generic are ok with the dual dispatch game and are willing to make the required modifications to their packages. It's an important change and I don't see an easy way to make it happen smoothly (i.e. thru a deprecated/defunct cycle). Here is the list of packages that currently define methods for BiocGenerics::normalize(): affyPLM Cardinal codelink CopyNumber450k csaw diffHic EBImage epigenomix MSnbase oligo qpcrNorm scran [Interestingly the scran package defines a default "normalize" method (i.e. a normalize,ANY method)]. Whether we make the second argument lightweight or parameterized (which is something that would need to be decided at the level of the generic)
By 'parameterized' I meant that the second argument could have slots or not; I'm not sure whether the use was clear. I don't think the generic needs to know about parameterization (also, I agree below that there is no need for a base class).
The name of the second argument would depend on whether it's lightweight or parameterized. Could be something like 'with' or 'method' for the former and 'param' for the latter. Also the man page for the generic would need to be able to briefly describe that argument and provide some guidelines to developers. The more consistent the normalize() interface is across the dozens of methods, the better for the user. H.
these packages will break as soon as we change the signature of the generic. So we'll need to wait after the release before this happens. Personally I find the lightweight second argument not particularly intuitive, elegant, or user-friendly. I'd rather type normalizeSwing(se, ...) or normalize(se, SwingParam(...)) than normalize(se, WithSwing(), ...). Last thing: In case of a parameterized second argument, do we really need a virtual normalizeParam class as parent of all the concrete normalizeParam* classes? If so then I guess we would need to have it defined in BiocGenerics but I think we should try hard to not start defining classes in this package (that could take us too far...)
I agree, I don't think a base class or decisions about parameterization
would be necessary -- create a MerryGoRoundParam() (parameterized with
horses or not; much safer than swings!) and the
normalize,SummarizedExperiment,MerryGoRoundParam-method and you're done.
Also wouldn't dispatch with second arg 'missing' or ANY recover current
behavior? Maybe it would need to be named and at the end to be
implemented without others buying in.
setGeneric("foo", function(x, y, ...) standardGeneric("foo"))
## setGeneric("foo", function(x, ..., y) standardGeneric("foo"))
.A <- setClass("A", representation(x="numeric"))
setMethod("foo", "A", function(x, y, ...) "ImanA")
> foo(.A())
[1] "ImanA" For the normalizeSlides etc approach... I agree that it is probably easier for users, though maybe many users are following a vignette so the benefit is relatively minimal; there is definitely value in using a common prefix or suffix (normalize* or *Param()). Here's what limma does
> normalize<tab>
normalizeBetweenArrays normalizeQuantiles normalizeCyclicLoess normalizeRobustSpline normalizeForPrintorder normalizeVSN normalizeForPrintorder.rg normalizeVSN.default normalizeMedianAbsValues normalizeVSN.EListRaw normalizeMedianValues normalizeVSN.RGList normalizePath normalizeWithinArrays and note the complexity in normalizeVSN, where we're back at desiring a generic! If one wanted to implement normalizeQuantiles,SummarizedExperiment-method then it would be back to negotiating with individual package authors to introduce a new BiocGenerics. Martin
H. On 04/26/2016 03:03 PM, Aaron Lun wrote:
Yes, but "monkeyBars" doesn't have quite the same pithiness for a package name. Anyway, the dual dispatch mechanism sounds most interesting. I assume that means we'd have to define some sort of base "normalizeParam" class, and then derive "csawNormParam" and "swingsNormParam" subclasses, so that specific methods can be defined for each signature. - Aaron Martin Morgan wrote:
On 04/26/2016 05:28 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 2:16 PM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/26/2016 04:47 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun<alun at wehi.edu.au>
wrote: ...
BiocGenerics. However, if some other hypothetical package
(I'll call it
"swings", for argument's sake) were to define a normalize()
method with a ...
I like the dual dispatch method quite a bit (but wonder why we get
several
swings but only one csaw? Maybe a csaw implies two participants
[though I
think I once in a while csaw-ed alone], so a singular csaw and a
pair of
swings balance out?), partly because it's very easy to extend
(write another
method) and the second argument can be either lightweight or
parameterized.
I could go along with the dual dispatch. "Swings" is short for "Set of swings". Usually, there are several swings in a row, but only one see-saw.
Googling for "how many swings per see-saw" took me to https://www.cpsc.gov//PageFiles/108601/playgrnd.pdf where it is apparent that swings are much more dangerous than see-saws (e.g., 51 matches for "swing" versus 4 for "see-saw"; "Swings ... were involved in about 19 ... percent of injuries ... See-saws accounted for about three percent"; "Homemade rope, tire, or tree swings were also involved in a number of hanging deaths" [no mention of death by see-saw]). I think for the sake of our users, especially our younger users, we do not want to consider swings, or even methods on swings, further. Martin This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
On 04/27/2016 04:24 AM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 11:12 PM, Herv? Pag?s <hpages at fredhutch.org> wrote:
Hi, I would not discard defining a SummarizedExperiment subclass so quickly. SummarizedExperiment is very generic and can contain any kind of data. IIUC the csaw package uses SummarizedExperiment to store a particular kind of data (ChIP-seq data) and I believe specialization is a legitimate situation for defining a subclass, even if the subclass is a "straight" subclass i.e. a subclass that doesn't add new slots or doesn't touch the existing slots. OTOH introducing a "straight" subclass only to define one specialized method on it (the "normalize" method in this case) might not be worth it since there is a cost for such class, even if that cost is minimal: a cost for the user (one new container/constructor to deal with) and a cost for the developer (e.g. multiplication of coerce methods).
If the data are more specialized, specialize the data structure,
Isn't it what I'm doing when I define a "straight" subclass? The fact that I don't need to alter the internal representation is an implementation detail (and it could change at some point) but what's important is that from a user point of view my container is now tagged/specialized. I might only have one specialized method for it at the moment but I might have more in the future, and/or other package developers might build on top of my specialized container and add specialized methods in the future (and I cross my fingers that since I own the specialized container and already implemented a "normalize" method for it, nobody will redefine that method in their package). H.
but the fact that the specialization solves the normalize() ambiguity is a mere coincidence. There are two different concerns.
Changing the signature of the normalize() generic in BiocGenerics and introducing dual dispath is of course doable but that means the maintainers of the packages that define methods on this generic are ok with the dual dispatch game and are willing to make the required modifications to their packages. It's an important change and I don't see an easy way to make it happen smoothly (i.e. thru a deprecated/defunct cycle).
In conjunction with what Martin said, you could define a "ANY","missing" method that emits a deprecation warning, and then recall the generic using NULL or something for the second argument so that it falls through. Packages would only need to fix the formals of their method definition.
Here is the list of packages that currently define methods for BiocGenerics::normalize(): affyPLM Cardinal codelink CopyNumber450k csaw diffHic EBImage epigenomix MSnbase oligo qpcrNorm scran [Interestingly the scran package defines a default "normalize" method (i.e. a normalize,ANY method)]. Whether we make the second argument lightweight or parameterized (which is something that would need to be decided at the level of the generic) these packages will break as soon as we change the signature of the generic. So we'll need to wait after the release before this happens. Personally I find the lightweight second argument not particularly intuitive, elegant, or user-friendly. I'd rather type normalizeSwing(se, ...) or normalize(se, SwingParam(...)) than normalize(se, WithSwing(), ...).
Sure, WithSwing() could hold arguments as well, but I agree that the Param suffix is more consistent. The Param naming is not great for autocompletion. Though I guess the interface could provide hints based on the defined methods.
Last thing: In case of a parameterized second argument, do we really need a virtual normalizeParam class as parent of all the concrete normalizeParam* classes? If so then I guess we would need to have it defined in BiocGenerics but I think we should try hard to not start defining classes in this package (that could take us too far...)
I would say no, no real need for a base class.
H. On 04/26/2016 03:03 PM, Aaron Lun wrote:
Yes, but "monkeyBars" doesn't have quite the same pithiness for a package name. Anyway, the dual dispatch mechanism sounds most interesting. I assume that means we'd have to define some sort of base "normalizeParam" class, and then derive "csawNormParam" and "swingsNormParam" subclasses, so that specific methods can be defined for each signature. - Aaron Martin Morgan wrote:
On 04/26/2016 05:28 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 2:16 PM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/26/2016 04:47 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun<alun at wehi.edu.au>
wrote: ...
BiocGenerics. However, if some other hypothetical package
(I'll call it
"swings", for argument's sake) were to define a normalize()
method with a ...
I like the dual dispatch method quite a bit (but wonder why we get
several
swings but only one csaw? Maybe a csaw implies two participants
[though I
think I once in a while csaw-ed alone], so a singular csaw and a
pair of
swings balance out?), partly because it's very easy to extend
(write another
method) and the second argument can be either lightweight or
parameterized.
I could go along with the dual dispatch. "Swings" is short for "Set of swings". Usually, there are several swings in a row, but only one see-saw.
Googling for "how many swings per see-saw" took me to
https://www.cpsc.gov//PageFiles/108601/playgrnd.pdf
where it is apparent that swings are much more dangerous than see-saws
(e.g., 51 matches for "swing" versus 4 for "see-saw"; "Swings ... were
involved in about 19 ... percent of injuries ... See-saws accounted
for about three percent"; "Homemade rope, tire, or tree swings were
also involved in a number of hanging deaths" [no mention of death by
see-saw]).
I think for the sake of our users, especially our younger users, we do
not want to consider swings, or even methods on swings, further.
Martin
This email message may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), or the
employee or agent responsible for the delivery of this message to the
intended recipient(s), you are hereby notified that any disclosure,
copying, distribution, or use of this email message is prohibited. If
you have received this message in error, please notify the sender
immediately by e-mail and delete this email message from your
computer. Thank you.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
On Wed, Apr 27, 2016 at 10:52 AM, Herv? Pag?s <hpages at fredhutch.org> wrote:
On 04/27/2016 04:24 AM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 11:12 PM, Herv? Pag?s <hpages at fredhutch.org> wrote:
Hi, I would not discard defining a SummarizedExperiment subclass so quickly. SummarizedExperiment is very generic and can contain any kind of data. IIUC the csaw package uses SummarizedExperiment to store a particular kind of data (ChIP-seq data) and I believe specialization is a legitimate situation for defining a subclass, even if the subclass is a "straight" subclass i.e. a subclass that doesn't add new slots or doesn't touch the existing slots. OTOH introducing a "straight" subclass only to define one specialized method on it (the "normalize" method in this case) might not be worth it since there is a cost for such class, even if that cost is minimal: a cost for the user (one new container/constructor to deal with) and a cost for the developer (e.g. multiplication of coerce methods).
If the data are more specialized, specialize the data structure,
Isn't it what I'm doing when I define a "straight" subclass? The fact that I don't need to alter the internal representation is an implementation detail (and it could change at some point) but what's important is that from a user point of view my container is now tagged/specialized. I might only have one specialized method for it at the moment but I might have more in the future, and/or other package developers might build on top of my specialized container and add specialized methods in the future (and I cross my fingers that since I own the specialized container and already implemented a "normalize" method for it, nobody will redefine that method in their package).
In general, yes, there are behaviors that clearly pertain to specific types of data. But the normalize() generic is just too generic. There are multiple ways to normalize ChIP-seq data, and we can't count on one method to implement all of them. We want to encourage innovation in statistical methods, so we should make it easy to extend Bioconductor in that way.
H.
but the fact that the specialization solves the normalize() ambiguity is a mere coincidence. There are two different concerns.
Changing the signature of the normalize() generic in BiocGenerics and introducing dual dispath is of course doable but that means the maintainers of the packages that define methods on this generic are ok with the dual dispatch game and are willing to make the required modifications to their packages. It's an important change and I don't see an easy way to make it happen smoothly (i.e. thru a deprecated/defunct cycle).
In conjunction with what Martin said, you could define a "ANY","missing" method that emits a deprecation warning, and then recall the generic using NULL or something for the second argument so that it falls through. Packages would only need to fix the formals of their method definition.
Here is the list of packages that currently define methods for BiocGenerics::normalize(): affyPLM Cardinal codelink CopyNumber450k csaw diffHic EBImage epigenomix MSnbase oligo qpcrNorm scran [Interestingly the scran package defines a default "normalize" method (i.e. a normalize,ANY method)]. Whether we make the second argument lightweight or parameterized (which is something that would need to be decided at the level of the generic) these packages will break as soon as we change the signature of the generic. So we'll need to wait after the release before this happens. Personally I find the lightweight second argument not particularly intuitive, elegant, or user-friendly. I'd rather type normalizeSwing(se, ...) or normalize(se, SwingParam(...)) than normalize(se, WithSwing(), ...).
Sure, WithSwing() could hold arguments as well, but I agree that the Param suffix is more consistent. The Param naming is not great for autocompletion. Though I guess the interface could provide hints based on the defined methods.
Last thing: In case of a parameterized second argument, do we really need a virtual normalizeParam class as parent of all the concrete normalizeParam* classes? If so then I guess we would need to have it defined in BiocGenerics but I think we should try hard to not start defining classes in this package (that could take us too far...)
I would say no, no real need for a base class.
H. On 04/26/2016 03:03 PM, Aaron Lun wrote:
Yes, but "monkeyBars" doesn't have quite the same pithiness for a package name. Anyway, the dual dispatch mechanism sounds most interesting. I assume that means we'd have to define some sort of base "normalizeParam" class, and then derive "csawNormParam" and "swingsNormParam" subclasses, so that specific methods can be defined for each signature. - Aaron Martin Morgan wrote:
On 04/26/2016 05:28 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 2:16 PM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/26/2016 04:47 PM, Michael Lawrence wrote:
On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun<alun at wehi.edu.au>
wrote: ...
BiocGenerics. However, if some other hypothetical package
(I'll call it
"swings", for argument's sake) were to define a normalize()
method with a ...
I like the dual dispatch method quite a bit (but wonder why we get
several
swings but only one csaw? Maybe a csaw implies two participants
[though I
think I once in a while csaw-ed alone], so a singular csaw and a
pair of
swings balance out?), partly because it's very easy to extend
(write another
method) and the second argument can be either lightweight or
parameterized.
I could go along with the dual dispatch. "Swings" is short for "Set of swings". Usually, there are several swings in a row, but only one see-saw.
Googling for "how many swings per see-saw" took me to
https://www.cpsc.gov//PageFiles/108601/playgrnd.pdf
where it is apparent that swings are much more dangerous than see-saws
(e.g., 51 matches for "swing" versus 4 for "see-saw"; "Swings ... were
involved in about 19 ... percent of injuries ... See-saws accounted
for about three percent"; "Homemade rope, tire, or tree swings were
also involved in a number of hanging deaths" [no mention of death by
see-saw]).
I think for the sake of our users, especially our younger users, we do
not want to consider swings, or even methods on swings, further.
Martin
This email message may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), or the
employee or agent responsible for the delivery of this message to the
intended recipient(s), you are hereby notified that any disclosure,
copying, distribution, or use of this email message is prohibited. If
you have received this message in error, please notify the sender
immediately by e-mail and delete this email message from your
computer. Thank you.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319