Skip to content

[Bioc-devel] R6 and Bioconductor

12 messages · Martin Morgan, Michael Lawrence, Gabriel Becker +1 more

#
Hello,

I am thinking of creating package for Bioconductor, and I am wondering about the use of R6 classes (from the R6 package). I do indeed intend to use existing Bioconductor classes such as SummarizedExperiment and interact and make use of other Bioconductor packages, such as scater and DESeq2. This appears to be in accordance with the guidelines (https://www.bioconductor.org/developers/package-guidelines/#classes), but does the use of R6 classes disqualify the package from being in Bioconductor? Do I need to write my classes as S4 in order to qualify?

Thank you for your help.

Best regards,
Garth
#
On 05/12/2017 02:05 AM, Garth Ilsley wrote:
I think there's little value in exposing R6 classes to Bioconductor 
users, introducing yet another syntax and semantics, and would strongly 
discourage their use outside the package name space.

Inside the package name space the maintainer has more liberty to adopt 
programming practices that are geared toward correct and efficient 
implementations; if R6 fills this role (I'm not an expert, but I don't 
think R6 enforces strong type checking and is not particularly 
efficient) then it would be appropriate to use them.

Martin
This email message may contain legally privileged and/or...{{dropped:2}}
#
On 05/12/2017 07:11 AM, Martin Morgan wrote:
Maybe one additional point is that perhaps 'write my classes' implies 
that you'll be creating new classes; it might often be better to re-use 
existing classes, or worst-case write simple extensions (e.g., an 
additional slot to SummarizedExperiment) to existing classes. In this 
way you re-use existing robust software and don't further overwhelm the 
cognitive burden placed on the user struggling to navigate yet more 
functionality.

Martni
This email message may contain legally privileged and/or...{{dropped:2}}
#
One place where one might think of using R6 is in the implementation
of a mutable data model underlying a GUI like a Shiny app. If mutable
semantics are required, consider using S4 reference classes, as they
offer more features than R6 and will integrate directly with
Bioconductor S4 classes.

Michael
On Thu, May 11, 2017 at 11:05 PM, Garth Ilsley <garth.ilsley at oist.jp> wrote:
#
Thank you.
If I understand correctly, you are saying that it is fine to use Reference classes (mutable semantics) in Bioconductor. A GUI is one clear place for this. However, what about a large dataset that is subject to progressive analysis with various fields updated as the analysis proceeds? The typical Bioconductor approach (as far as I have seen) is to call a method defined for an S4 functional class that produces a new object of the same class, with the result assigned to the same name as the original object.  For a project considered in isolation, it wouldn't be unreasonable to use a Reference class for this instead, but that's not what I'm asking. My question is about the standards and approach that Bioconductor has agreed on - to ensure consistency. Is a Reference Class permissible in this situation? If not, case closed. If they are permitted, I would suggest that R6 semantics are consistent with Reference Class semantics, but with the added benefit of private members and "active bindings" (they look like fields, but call a function). This is nice and simple (for the creator and user of the class), but if not desired (for consistency etc.), then I presume Reference Classes will do fine.
#
On May 12, 2017 4:23 PM, "Garth Ilsley" <garth.ilsley at oist.jp> wrote:
Thank you.
mutable data model underlying a GUI like a Shiny app. > If mutable
semantics are required, consider using S4 reference classes, as they offer
more features than R6 and will integrate
If I understand correctly, you are saying that it is fine to use Reference
classes (mutable semantics) in Bioconductor. A GUI is one clear place for
this. However, what about a large dataset that is subject to progressive
analysis with various fields updated as the analysis proceeds? The typical
Bioconductor approach (as far as I have seen) is to call a method defined
for an S4 functional class that produces a new object of the same class,
with the result assigned to the same name as the original object.  For a
project considered in isolation, it wouldn't be unreasonable to use a
Reference class for this instead, but that's not what I'm asking. My
question is about the standards and approach that Bioconductor has agreed
on - to ensure consistency. Is a Reference Class permissible in this
situation?


I dont speak for the project, but i would suggest that reference classes
are really best/(almost) only useful for encoding state in
complex/unusual-for-r package code. Having user-facing objects with these
mechanics violates a pretty central idiom of R (copy on write) and thus is
imo substantially more damaging than it is worth in general.

One of the things that makes r simpler for beginners than other languages
is that when they pass an object to a function that function "can't" change
the version they have in their workspace.

If not, case closed. If they are permitted, I would suggest that R6
semantics are consistent with Reference Class semantics, but with the added
benefit of private members and "active bindi
 ngs" (they look like fields, but call a function).


Refence classes absolutely can have active binding fields. It is pretty
standard practice I think.

As for private fields, no they don't have that, but I've never really been
convinced you need them in the vast vast majority of cases. R is designed
such that the user owns their data (ie the contents of their objects). I've
never really heard a good augment why that shouldn't be the case.

That said the typical idiom in all of my code is to have paired fields, an
active binding which is a function. That does some checking/processing and
a classed field with the same name prepended with a . That it corresponds
to.

Also R6 aren't really compatible with reference class/S4 mechanics because
the fields are not classed. This may sound like a small thing but imo it's
actually quite important.

Best,
~G


This is nice and simple (for the creator and user of the class), but if not
desired (for consistency etc.), then I presume Reference Classes will do
fine.


_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
#
Just to add to what Gabe already said, defer your performance concerns
until you've actually got something that works and is well written. If
you hit up against a performance barrier, come back and we can help.
On Fri, May 12, 2017 at 4:22 PM, Garth Ilsley <garth.ilsley at oist.jp> wrote:
#
A really helpful answer, thank you.

I dont speak for the project, but i would suggest that reference classes are really best/(almost) only useful for encoding state in complex/unusual-for-r package code. Having user-facing objects with these mechanics violates a pretty central idiom of R (copy on write) and thus is imo substantially more damaging than it is worth in general.

One of the things that makes r simpler for beginners than other languages is that when they pass an object to a function that function "can't" change the version they have in their workspace.


As you suggest, specifics matter, but thanks for explaining the context.

Refence classes absolutely can have active binding fields. It is pretty standard practice I think.

Thanks, I hadn?t realised that.

As for private fields, no they don't have that, but I've never really been convinced you need them in the vast vast majority of cases. R is designed such that the user owns their data (ie the contents of their objects). I've never really heard a good augment why that shouldn't be the case.

What I like is that they reduce clutter in the class interface, and more importantly, allow you to make it clear what part of the interface the user can expect to remain stable in future versions. They are the implementation details that might change.

That said the typical idiom in all of my code is to have paired fields, an active binding which is a function. That does some checking/processing and a classed field with the same name prepended with a . That it corresponds to.

Thanks for the pointer. Does the initial . suggest that the user shouldn?t make use of these fields directly i.e. does this fulfil the role of private field?

Also R6 aren't really compatible with reference class/S4 mechanics because the fields are not classed. This may sound like a small thing but imo it's actually quite important.


A good point.
#
To clarify, my question wasn't motivated by concerns about performance, but I appreciate the offer. My interest was style and idiom since there is more than one way to do it. However, I believe my question has been answered: S4 functional classes are the preferred and expected idiom for user-facing classes in Bioconductor. I'm fine with following this approach in order to ensure consistency, and so on.
#
As far as separating implementation from interface, a fairly simple
separation is fields are implementation, methods are interface. While
active bindings allow encapsulated fields (often called "properties"
in other languages), it puts extra cognitive load on the user to
remember which state components are fields and which are returned by
methods. Some frameworks use properties to enable observation of
mutable state (see the objectProperties package on CRAN for doing this
with reference classes). But with reactivity in vogue, it's not clear
whether the explicit observer pattern is still relevant.

Michael
On Sat, May 13, 2017 at 2:15 AM, Garth Ilsley <garth.ilsley at oist.jp> wrote:
1 day later
#
Thanks for the pointers and advice. I'll have a look at these.
#
Thanks for explaining the context. It helps to understand what Bioconductor developers expect.