[Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods

Thu, Feb 6, 2020 5:34 PM

Thanks Michael,

yes in a sense, ttBulk and SummariseExperiment can be considere as two
interfaces. Would be fair enough to create a function that convert from one
to the other, although the default would be ttBulk?

*> I'm not sure the tidyverse is a great answer to the user interface,
because it lacks domain semantics *

Would be fair to say that ttBulk class could be considered a tibble with
specific semantics? In the sense that it holds information about key column
names (.sample, .transcript, .abundance, .normalised_abundance, etc..), and
has a validator (that is triggered at every ttBulk function).

I think at the moment, given (i) S3 problem, and (ii) the lack of formal
foundation on SummaisedExperiment interface (that maybe would require an S4
technology itself, where SummariseExperiment could be a slot?) my package
would belong more to CRAN, until those two issues will have been resolved.

I imagine there are not many cases where a CRAN package migrated to
Bioconductor after complying with the ecosystem policies.

Thanks a lot.

Best wishes.

*Stefano *



Stefano Mangiola | Postdoctoral fellow

Papenfuss Laboratory

The Walter Eliza Hall Institute of Medical Research

+61 (0)466452544


Il giorno ven 7 feb 2020 alle ore 12:12 Michael Lawrence <
lawrence.michael at gene.com> ha scritto:

There's a difference between implementing software, where one wants
formal data structures, and providing a convenient user interface.
Software needs to interface with other software, so a package could
provide both types of interfaces, one based on rich (S4) data
structures, another on simpler structures with an API more amenable to
analysis. I'm not sure the tidyverse is a great answer to the user
interface, because it lacks domain semantics. This is still an active
area of research (see Stuart Lee's plyranges, for example). I hope you
can find a reasonable compromise that enables you to integrate ttBulk
into Bioconductor, so that it can take advantage of the synergies the
ecosystem provides.

PS: There is no simple fix for your example.

Michael

On Thu, Feb 6, 2020 at 4:12 PM stefano <mangiolastefano at gmail.com> wrote:

Thanks a lot for your comment Martin and Michael,

Here I reply to Marti's comment. Michael I will try to implement your
solution!

I think a key point from

https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106

(that I was under-looking) is

*>> "So to sum up: if you submit a package to Bioconductor, there is an
expectation that your package can work seamlessly with other Bioconductor
packages, and your implementation should support that. The safest and
easiest way to do that is to use Bioconductor data structures"*

In this case my package would not be suited as I do not use pre-existing
Bioconductor data structures, but instead i see value in using a simple
tibble, for the reasons in part explained in the README
https://github.com/stemangiola/ttBulk (harvesting the power of tidyverse
and friends for bulk transcriptomic analyses).

*>> "with the minimum standard of being able to accept such objects even

if

you do not rely on them internally (though you should)"*

With this I can comply in the sense that I can built converters to and

from

SummarizedExperiment (for example).

* >> "If you don't want to do that, then that's a shame, but it would
suggest that Bioconductor would not be the right place to host this
package."*

Well said.

In summary, I do not rely on Bioconductor data structure, as I am

proposing

another paradigm, but my back end is made of largely Bioconductor

analysis

packages that I would like to interface with tidyverse. So

1) Should I build converters to Bioc. data structures, and force the use

of

S3 object (needed to tiidyverse to work), or
2) Submit to CRAN

I don't have strong feeling for either, although I think Bioconductor

would

be a good fit. Please community give me your honest opinions, I will take
them seriously and proceed.



Best wishes.

*Stefano *



Stefano Mangiola | Postdoctoral fellow

Papenfuss Laboratory

The Walter Eliza Hall Institute of Medical Research

+61 (0)466452544


Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan <
mtmorgan.bioc at gmail.com> ha scritto:

The idea isn't to use S4 at any cost, but to 'play well' with the
Bioconductor ecosystem, including writing robust and maintainable code.

This comment

https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106

provides some motivation; there was also an interesting exchange on the
Bioconductor community slack about this (join at
https://bioc-community.herokuapp.com/; discussion starting with
https://community-bioc.slack.com/archives/C35G93GJH/p1580144746014800

).

The plyranges package http://bioconductor.org/packages/plyranges and
recently accepted fluentGenomics workflow
https://github.com/Bioconductor/Contributions/issues/1350 provide
illustrations.

In your domain it's really surprising that your package does not use
(Import or Depend on) SummarizedExperiment or GenomicRanges packages.

From

a superficial look at your package, it seems like something like
`reduce_dimensions()` could be defined to take & return a
SummarizedExperiment and hence benefit from some of the points in the
github issue comment mentioned above.

Certainly there is a useful transition, both 'on the way in' to a
SummarizedExperiment, and after leaving the more specialized

bioinformatic

computations to, e.g., display a pairs plot of the reduced dimensions,
where one might re-shape the data to a tidy format and use 'plain old'
tibbles; the fluentGenomics workflow might provide some guidance.

At the end of the day it would not be surprising for Bioconductor

packages

to make use of tidy concepts and data structures, particularly in the
vignette, and it would be a mistake for Bioconductor to exclude
well-motivated 'tidy' representations.

Martin Morgan

?On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" <
bioc-devel-bounces at r-project.org on behalf of

mangiolastefano at gmail.com>

wrote:

    Hello,

    I have a package (ttBulk) under review. I have been told to replace
the S3
    system to S4. My package is based on the class tbl_df and must be

fully

    compatible with tidyverse methods (inheritance). After some tests

and

    research I understood that tidyverse ecosystem is not compatible

with

S4
    classes.

     For example, several methos do not apparently handle S4 objects

based

on
    S3 tbl_df

    ```library(tidyverse)setOldClass("tbl_df")
    setClass("test2", contains = "tbl_df")
    my <- new("test2",  tibble(a = 1))
    my %>%  mutate(b = 3)

       a b
    1 1 3
    ```

     ```my <- new("test2",  tibble(a = rnorm(100), b = 1))
    my %>% nest(data = -b)
    Error: `x` must be a vector, not a `test2` object
    Run `rlang::last_error()` to see where the error occurred.
    ```

    Could you please advise whether a tidyverse based package can be
hosted on
    Bioconductor, and if S4 classes are really mandatory? I need to
understand
    if I am forced to submit to CRAN instead (although Bioconductor

would

be a
    good fit, sice I try to interface transcriptional analysis tools to
tidy
    universe)


    Thanks a lot.
    Stefano

        [[alternative HTML version deleted]]

    _______________________________________________
    Bioc-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods

Thread (14 messages)