Hello, I am thinking of creating package for Bioconductor, and I am wondering about the use of R6 classes (from the R6 package). I do indeed intend to use existing Bioconductor classes such as SummarizedExperiment and interact and make use of other Bioconductor packages, such as scater and DESeq2. This appears to be in accordance with the guidelines (https://www.bioconductor.org/developers/package-guidelines/#classes), but does the use of R6 classes disqualify the package from being in Bioconductor? Do I need to write my classes as S4 in order to qualify? Thank you for your help. Best regards, Garth
[Bioc-devel] R6 and Bioconductor
12 messages · Martin Morgan, Michael Lawrence, Gabriel Becker +1 more
On 05/12/2017 02:05 AM, Garth Ilsley wrote:
Hello, I am thinking of creating package for Bioconductor, and I am wondering about the use of R6 classes (from the R6 package). I do indeed intend to use existing Bioconductor classes such as SummarizedExperiment and interact and make use of other Bioconductor packages, such as scater and DESeq2. This appears to be in accordance with the guidelines (https://www.bioconductor.org/developers/package-guidelines/#classes), but does the use of R6 classes disqualify the package from being in Bioconductor? Do I need to write my classes as S4 in order to qualify?
I think there's little value in exposing R6 classes to Bioconductor users, introducing yet another syntax and semantics, and would strongly discourage their use outside the package name space. Inside the package name space the maintainer has more liberty to adopt programming practices that are geared toward correct and efficient implementations; if R6 fills this role (I'm not an expert, but I don't think R6 enforces strong type checking and is not particularly efficient) then it would be appropriate to use them. Martin
Thank you for your help. Best regards, Garth [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}
On 05/12/2017 07:11 AM, Martin Morgan wrote:
On 05/12/2017 02:05 AM, Garth Ilsley wrote:
Hello, I am thinking of creating package for Bioconductor, and I am wondering about the use of R6 classes (from the R6 package). I do indeed intend to use existing Bioconductor classes such as SummarizedExperiment and interact and make use of other Bioconductor packages, such as scater and DESeq2. This appears to be in accordance with the guidelines (https://www.bioconductor.org/developers/package-guidelines/#classes), but does the use of R6 classes disqualify the package from being in Bioconductor? Do I need to write my classes as S4 in order to qualify?
I think there's little value in exposing R6 classes to Bioconductor users, introducing yet another syntax and semantics, and would strongly discourage their use outside the package name space. Inside the package name space the maintainer has more liberty to adopt programming practices that are geared toward correct and efficient implementations; if R6 fills this role (I'm not an expert, but I don't think R6 enforces strong type checking and is not particularly efficient) then it would be appropriate to use them.
Maybe one additional point is that perhaps 'write my classes' implies that you'll be creating new classes; it might often be better to re-use existing classes, or worst-case write simple extensions (e.g., an additional slot to SummarizedExperiment) to existing classes. In this way you re-use existing robust software and don't further overwhelm the cognitive burden placed on the user struggling to navigate yet more functionality. Martni
Martin
Thank you for your help.
Best regards,
Garth
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}
One place where one might think of using R6 is in the implementation of a mutable data model underlying a GUI like a Shiny app. If mutable semantics are required, consider using S4 reference classes, as they offer more features than R6 and will integrate directly with Bioconductor S4 classes. Michael
On Thu, May 11, 2017 at 11:05 PM, Garth Ilsley <garth.ilsley at oist.jp> wrote:
Hello, I am thinking of creating package for Bioconductor, and I am wondering about the use of R6 classes (from the R6 package). I do indeed intend to use existing Bioconductor classes such as SummarizedExperiment and interact and make use of other Bioconductor packages, such as scater and DESeq2. This appears to be in accordance with the guidelines (https://www.bioconductor.org/developers/package-guidelines/#classes), but does the use of R6 classes disqualify the package from being in Bioconductor? Do I need to write my classes as S4 in order to qualify? Thank you for your help. Best regards, Garth [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Thank you.
One place where one might think of using R6 is in the implementation of a mutable data model underlying a GUI like a Shiny app. > If mutable semantics are required, consider using S4 reference classes, as they offer more features than R6 and will integrate directly with Bioconductor S4 classes.
If I understand correctly, you are saying that it is fine to use Reference classes (mutable semantics) in Bioconductor. A GUI is one clear place for this. However, what about a large dataset that is subject to progressive analysis with various fields updated as the analysis proceeds? The typical Bioconductor approach (as far as I have seen) is to call a method defined for an S4 functional class that produces a new object of the same class, with the result assigned to the same name as the original object. For a project considered in isolation, it wouldn't be unreasonable to use a Reference class for this instead, but that's not what I'm asking. My question is about the standards and approach that Bioconductor has agreed on - to ensure consistency. Is a Reference Class permissible in this situation? If not, case closed. If they are permitted, I would suggest that R6 semantics are consistent with Reference Class semantics, but with the added benefit of private members and "active bindings" (they look like fields, but call a function). This is nice and simple (for the creator and user of the class), but if not desired (for consistency etc.), then I presume Reference Classes will do fine.
On May 12, 2017 4:23 PM, "Garth Ilsley" <garth.ilsley at oist.jp> wrote:
Thank you.
One place where one might think of using R6 is in the implementation of a
mutable data model underlying a GUI like a Shiny app. > If mutable semantics are required, consider using S4 reference classes, as they offer more features than R6 and will integrate
directly with Bioconductor S4 classes.
If I understand correctly, you are saying that it is fine to use Reference classes (mutable semantics) in Bioconductor. A GUI is one clear place for this. However, what about a large dataset that is subject to progressive analysis with various fields updated as the analysis proceeds? The typical Bioconductor approach (as far as I have seen) is to call a method defined for an S4 functional class that produces a new object of the same class, with the result assigned to the same name as the original object. For a project considered in isolation, it wouldn't be unreasonable to use a Reference class for this instead, but that's not what I'm asking. My question is about the standards and approach that Bioconductor has agreed on - to ensure consistency. Is a Reference Class permissible in this situation? I dont speak for the project, but i would suggest that reference classes are really best/(almost) only useful for encoding state in complex/unusual-for-r package code. Having user-facing objects with these mechanics violates a pretty central idiom of R (copy on write) and thus is imo substantially more damaging than it is worth in general. One of the things that makes r simpler for beginners than other languages is that when they pass an object to a function that function "can't" change the version they have in their workspace. If not, case closed. If they are permitted, I would suggest that R6 semantics are consistent with Reference Class semantics, but with the added benefit of private members and "active bindi ngs" (they look like fields, but call a function). Refence classes absolutely can have active binding fields. It is pretty standard practice I think. As for private fields, no they don't have that, but I've never really been convinced you need them in the vast vast majority of cases. R is designed such that the user owns their data (ie the contents of their objects). I've never really heard a good augment why that shouldn't be the case. That said the typical idiom in all of my code is to have paired fields, an active binding which is a function. That does some checking/processing and a classed field with the same name prepended with a . That it corresponds to. Also R6 aren't really compatible with reference class/S4 mechanics because the fields are not classed. This may sound like a small thing but imo it's actually quite important. Best, ~G This is nice and simple (for the creator and user of the class), but if not desired (for consistency etc.), then I presume Reference Classes will do fine. _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Just to add to what Gabe already said, defer your performance concerns until you've actually got something that works and is well written. If you hit up against a performance barrier, come back and we can help.
On Fri, May 12, 2017 at 4:22 PM, Garth Ilsley <garth.ilsley at oist.jp> wrote:
Thank you.
One place where one might think of using R6 is in the implementation of a mutable data model underlying a GUI like a Shiny app. > If mutable semantics are required, consider using S4 reference classes, as they offer more features than R6 and will integrate directly with Bioconductor S4 classes.
If I understand correctly, you are saying that it is fine to use Reference classes (mutable semantics) in Bioconductor. A GUI is one clear place for this. However, what about a large dataset that is subject to progressive analysis with various fields updated as the analysis proceeds? The typical Bioconductor approach (as far as I have seen) is to call a method defined for an S4 functional class that produces a new object of the same class, with the result assigned to the same name as the original object. For a project considered in isolation, it wouldn't be unreasonable to use a Reference class for this instead, but that's not what I'm asking. My question is about the standards and approach that Bioconductor has agreed on - to ensure consistency. Is a Reference Class permissible in this situation? If not, case closed. If they are permitted, I would suggest that R6 semantics are consistent with Reference Class semantics, but with the added benefit of private members and "active bindi ngs" (they look like fields, but call a function). This is nice and simple (for the creator and user of the class), but if not desired (for consistency etc.), then I presume Reference Classes will do fine.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
A really helpful answer, thank you. I dont speak for the project, but i would suggest that reference classes are really best/(almost) only useful for encoding state in complex/unusual-for-r package code. Having user-facing objects with these mechanics violates a pretty central idiom of R (copy on write) and thus is imo substantially more damaging than it is worth in general. One of the things that makes r simpler for beginners than other languages is that when they pass an object to a function that function "can't" change the version they have in their workspace. As you suggest, specifics matter, but thanks for explaining the context. Refence classes absolutely can have active binding fields. It is pretty standard practice I think. Thanks, I hadn?t realised that. As for private fields, no they don't have that, but I've never really been convinced you need them in the vast vast majority of cases. R is designed such that the user owns their data (ie the contents of their objects). I've never really heard a good augment why that shouldn't be the case. What I like is that they reduce clutter in the class interface, and more importantly, allow you to make it clear what part of the interface the user can expect to remain stable in future versions. They are the implementation details that might change. That said the typical idiom in all of my code is to have paired fields, an active binding which is a function. That does some checking/processing and a classed field with the same name prepended with a . That it corresponds to. Thanks for the pointer. Does the initial . suggest that the user shouldn?t make use of these fields directly i.e. does this fulfil the role of private field? Also R6 aren't really compatible with reference class/S4 mechanics because the fields are not classed. This may sound like a small thing but imo it's actually quite important. A good point.
Just to add to what Gabe already said, defer your performance concerns until you've actually got something that works and is well written. If you hit up against a performance barrier, come back and we can help.
To clarify, my question wasn't motivated by concerns about performance, but I appreciate the offer. My interest was style and idiom since there is more than one way to do it. However, I believe my question has been answered: S4 functional classes are the preferred and expected idiom for user-facing classes in Bioconductor. I'm fine with following this approach in order to ensure consistency, and so on.
As far as separating implementation from interface, a fairly simple separation is fields are implementation, methods are interface. While active bindings allow encapsulated fields (often called "properties" in other languages), it puts extra cognitive load on the user to remember which state components are fields and which are returned by methods. Some frameworks use properties to enable observation of mutable state (see the objectProperties package on CRAN for doing this with reference classes). But with reactivity in vogue, it's not clear whether the explicit observer pattern is still relevant. Michael
On Sat, May 13, 2017 at 2:15 AM, Garth Ilsley <garth.ilsley at oist.jp> wrote:
A really helpful answer, thank you.
I dont speak for the project, but i would suggest that reference classes are really best/(almost) only useful for encoding state in complex/unusual-for-r package code. Having user-facing objects with these mechanics violates a pretty central idiom of R (copy on write) and thus is imo substantially more damaging than it is worth in general.
One of the things that makes r simpler for beginners than other languages is that when they pass an object to a function that function "can't" change the version they have in their workspace.
As you suggest, specifics matter, but thanks for explaining the context.
Refence classes absolutely can have active binding fields. It is pretty standard practice I think.
Thanks, I hadn?t realised that.
As for private fields, no they don't have that, but I've never really been convinced you need them in the vast vast majority of cases. R is designed such that the user owns their data (ie the contents of their objects). I've never really heard a good augment why that shouldn't be the case.
What I like is that they reduce clutter in the class interface, and more importantly, allow you to make it clear what part of the interface the user can expect to remain stable in future versions. They are the implementation details that might change.
That said the typical idiom in all of my code is to have paired fields, an active binding which is a function. That does some checking/processing and a classed field with the same name prepended with a . That it corresponds to.
Thanks for the pointer. Does the initial . suggest that the user shouldn?t make use of these fields directly i.e. does this fulfil the role of private field?
Also R6 aren't really compatible with reference class/S4 mechanics because the fields are not classed. This may sound like a small thing but imo it's actually quite important.
A good point.
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
1 day later
Thanks for the pointers and advice. I'll have a look at these.
As far as separating implementation from interface, a fairly simple separation is fields are implementation, methods are interface. While active bindings allow encapsulated fields (often called "properties" in other languages), it puts extra cognitive load on the user to remember which state components are fields and which are returned by methods. Some frameworks use properties to enable observation of mutable state (see the objectProperties package on CRAN for doing this with reference classes). But with reactivity in vogue, it's not clear whether the explicit observer pattern is still relevant.
Thanks for explaining the context. It helps to understand what Bioconductor developers expect.
On 05/12/2017 07:11 AM, Martin Morgan wrote:
On 05/12/2017 02:05 AM, Garth Ilsley wrote:
Hello, I am thinking of creating package for Bioconductor, and I am wondering about the use of R6 classes (from the R6 package). I do indeed intend to use existing Bioconductor classes such as SummarizedExperiment and interact and make use of other Bioconductor packages, such as scater and DESeq2. This appears to be in accordance with the guidelines (https://www.bioconductor.org/developers/package-guidelines/#classes) , but does the use of R6 classes disqualify the package from being in Bioconductor? Do I need to write my classes as S4 in order to qualify?
I think there's little value in exposing R6 classes to Bioconductor users, introducing yet another syntax and semantics, and would strongly discourage their use outside the package name space. Inside the package name space the maintainer has more liberty to adopt programming practices that are geared toward correct and efficient implementations; if R6 fills this role (I'm not an expert, but I don't think R6 enforces strong type checking and is not particularly efficient) then it would be appropriate to use them.
Maybe one additional point is that perhaps 'write my classes' implies that you'll be creating new classes; it might often be better to re-use existing classes, or worst-case write simple extensions (e.g., an additional slot to SummarizedExperiment) to existing classes. In this way you re-use existing robust software and don't further overwhelm the cognitive burden placed on the user struggling to navigate yet more functionality.