Hello,
I have a package (ttBulk) under review. I have been told to replace the S3
system to S4. My package is based on the class tbl_df and must be fully
compatible with tidyverse methods (inheritance). After some tests and
research I understood that tidyverse ecosystem is not compatible with S4
classes.
For example, several methos do not apparently handle S4 objects based on
S3 tbl_df
```library(tidyverse)setOldClass("tbl_df")
setClass("test2", contains = "tbl_df")
my <- new("test2", tibble(a = 1))
my %>% mutate(b = 3)
a b
1 1 3
```
```my <- new("test2", tibble(a = rnorm(100), b = 1))
my %>% nest(data = -b)
Error: `x` must be a vector, not a `test2` object
Run `rlang::last_error()` to see where the error occurred.
```
Could you please advise whether a tidyverse based package can be hosted on
Bioconductor, and if S4 classes are really mandatory? I need to understand
if I am forced to submit to CRAN instead (although Bioconductor would be a
good fit, sice I try to interface transcriptional analysis tools to tidy
universe)
Thanks a lot.
Stefano
[Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods
14 messages · Michael Lawrence, Martin Morgan, Vincent Carey +1 more
The idea isn't to use S4 at any cost, but to 'play well' with the Bioconductor ecosystem, including writing robust and maintainable code. This comment https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106 provides some motivation; there was also an interesting exchange on the Bioconductor community slack about this (join at https://bioc-community.herokuapp.com/; discussion starting with https://community-bioc.slack.com/archives/C35G93GJH/p1580144746014800). The plyranges package http://bioconductor.org/packages/plyranges and recently accepted fluentGenomics workflow https://github.com/Bioconductor/Contributions/issues/1350 provide illustrations. In your domain it's really surprising that your package does not use (Import or Depend on) SummarizedExperiment or GenomicRanges packages. From a superficial look at your package, it seems like something like `reduce_dimensions()` could be defined to take & return a SummarizedExperiment and hence benefit from some of the points in the github issue comment mentioned above. Certainly there is a useful transition, both 'on the way in' to a SummarizedExperiment, and after leaving the more specialized bioinformatic computations to, e.g., display a pairs plot of the reduced dimensions, where one might re-shape the data to a tidy format and use 'plain old' tibbles; the fluentGenomics workflow might provide some guidance. At the end of the day it would not be surprising for Bioconductor packages to make use of tidy concepts and data structures, particularly in the vignette, and it would be a mistake for Bioconductor to exclude well-motivated 'tidy' representations. Martin Morgan ?On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" <bioc-devel-bounces at r-project.org on behalf of mangiolastefano at gmail.com> wrote: Hello, I have a package (ttBulk) under review. I have been told to replace the S3 system to S4. My package is based on the class tbl_df and must be fully compatible with tidyverse methods (inheritance). After some tests and research I understood that tidyverse ecosystem is not compatible with S4 classes. For example, several methos do not apparently handle S4 objects based on S3 tbl_df ```library(tidyverse)setOldClass("tbl_df") setClass("test2", contains = "tbl_df") my <- new("test2", tibble(a = 1)) my %>% mutate(b = 3) a b 1 1 3 ``` ```my <- new("test2", tibble(a = rnorm(100), b = 1)) my %>% nest(data = -b) Error: `x` must be a vector, not a `test2` object Run `rlang::last_error()` to see where the error occurred. ``` Could you please advise whether a tidyverse based package can be hosted on Bioconductor, and if S4 classes are really mandatory? I need to understand if I am forced to submit to CRAN instead (although Bioconductor would be a good fit, sice I try to interface transcriptional analysis tools to tidy universe) Thanks a lot. Stefano _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Martin's comments are great. I'll just give some technical help. The "tbl_df" S4 class is already defined by dplyr (and more properly), so omit the call to setOldClass(). Then, things work a bit better.
my %>% nest(-b) # had to fix this from your example
[some ugly result]
Warning messages:
1: In class(x) <- c(subclass, tibble_class) :
Setting class(x) to multiple strings ("tbl_df", "tbl", ...); result
will no longer be an S4 object
2: In split.default(data[nest_vars], idx) :
data length is not a multiple of split variable
I would argue that this is an issue with the tidyverse. They provide
an S4 class, which should in principle work, because S4 is compatible
enough with S3, but the use of class<-() breaks it. There may be a way
to make class()<- work in such cases. I will think about it.
The right way to do it with S4 would be to just call initialize(x,
...) in new_tibble(). They have to implement the initialize() logic
themselves using update_tibble_attrs(). S4 gives you that for free.
And there are more issues but I think this is a good example of the
difficulties.
Michael
On Thu, Feb 6, 2020 at 2:46 PM stefano <mangiolastefano at gmail.com> wrote:
Hello,
I have a package (ttBulk) under review. I have been told to replace the S3
system to S4. My package is based on the class tbl_df and must be fully
compatible with tidyverse methods (inheritance). After some tests and
research I understood that tidyverse ecosystem is not compatible with S4
classes.
For example, several methos do not apparently handle S4 objects based on
S3 tbl_df
```library(tidyverse)setOldClass("tbl_df")
setClass("test2", contains = "tbl_df")
my <- new("test2", tibble(a = 1))
my %>% mutate(b = 3)
a b
1 1 3
```
```my <- new("test2", tibble(a = rnorm(100), b = 1))
my %>% nest(data = -b)
Error: `x` must be a vector, not a `test2` object
Run `rlang::last_error()` to see where the error occurred.
```
Could you please advise whether a tidyverse based package can be hosted on
Bioconductor, and if S4 classes are really mandatory? I need to understand
if I am forced to submit to CRAN instead (although Bioconductor would be a
good fit, sice I try to interface transcriptional analysis tools to tidy
universe)
Thanks a lot.
Stefano
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
Thanks a lot for your comment Martin and Michael, Here I reply to Marti's comment. Michael I will try to implement your solution! I think a key point from https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106 (that I was under-looking) is *>> "So to sum up: if you submit a package to Bioconductor, there is an expectation that your package can work seamlessly with other Bioconductor packages, and your implementation should support that. The safest and easiest way to do that is to use Bioconductor data structures"* In this case my package would not be suited as I do not use pre-existing Bioconductor data structures, but instead i see value in using a simple tibble, for the reasons in part explained in the README https://github.com/stemangiola/ttBulk (harvesting the power of tidyverse and friends for bulk transcriptomic analyses). *>> "with the minimum standard of being able to accept such objects even if you do not rely on them internally (though you should)"* With this I can comply in the sense that I can built converters to and from SummarizedExperiment (for example). * >> "If you don't want to do that, then that's a shame, but it would suggest that Bioconductor would not be the right place to host this package."* Well said. In summary, I do not rely on Bioconductor data structure, as I am proposing another paradigm, but my back end is made of largely Bioconductor analysis packages that I would like to interface with tidyverse. So 1) Should I build converters to Bioc. data structures, and force the use of S3 object (needed to tiidyverse to work), or 2) Submit to CRAN I don't have strong feeling for either, although I think Bioconductor would be a good fit. Please community give me your honest opinions, I will take them seriously and proceed. Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan < mtmorgan.bioc at gmail.com> ha scritto:
The idea isn't to use S4 at any cost, but to 'play well' with the Bioconductor ecosystem, including writing robust and maintainable code. This comment https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106 provides some motivation; there was also an interesting exchange on the Bioconductor community slack about this (join at https://bioc-community.herokuapp.com/; discussion starting with https://community-bioc.slack.com/archives/C35G93GJH/p1580144746014800). The plyranges package http://bioconductor.org/packages/plyranges and recently accepted fluentGenomics workflow https://github.com/Bioconductor/Contributions/issues/1350 provide illustrations. In your domain it's really surprising that your package does not use (Import or Depend on) SummarizedExperiment or GenomicRanges packages. From a superficial look at your package, it seems like something like `reduce_dimensions()` could be defined to take & return a SummarizedExperiment and hence benefit from some of the points in the github issue comment mentioned above. Certainly there is a useful transition, both 'on the way in' to a SummarizedExperiment, and after leaving the more specialized bioinformatic computations to, e.g., display a pairs plot of the reduced dimensions, where one might re-shape the data to a tidy format and use 'plain old' tibbles; the fluentGenomics workflow might provide some guidance. At the end of the day it would not be surprising for Bioconductor packages to make use of tidy concepts and data structures, particularly in the vignette, and it would be a mistake for Bioconductor to exclude well-motivated 'tidy' representations. Martin Morgan ?On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" < bioc-devel-bounces at r-project.org on behalf of mangiolastefano at gmail.com> wrote: Hello, I have a package (ttBulk) under review. I have been told to replace the S3 system to S4. My package is based on the class tbl_df and must be fully compatible with tidyverse methods (inheritance). After some tests and research I understood that tidyverse ecosystem is not compatible with S4 classes. For example, several methos do not apparently handle S4 objects based on S3 tbl_df ```library(tidyverse)setOldClass("tbl_df") setClass("test2", contains = "tbl_df") my <- new("test2", tibble(a = 1)) my %>% mutate(b = 3) a b 1 1 3 ``` ```my <- new("test2", tibble(a = rnorm(100), b = 1)) my %>% nest(data = -b) Error: `x` must be a vector, not a `test2` object Run `rlang::last_error()` to see where the error occurred. ``` Could you please advise whether a tidyverse based package can be hosted on Bioconductor, and if S4 classes are really mandatory? I need to understand if I am forced to submit to CRAN instead (although Bioconductor would be a good fit, sice I try to interface transcriptional analysis tools to tidy universe) Thanks a lot. Stefano [[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
There's a difference between implementing software, where one wants formal data structures, and providing a convenient user interface. Software needs to interface with other software, so a package could provide both types of interfaces, one based on rich (S4) data structures, another on simpler structures with an API more amenable to analysis. I'm not sure the tidyverse is a great answer to the user interface, because it lacks domain semantics. This is still an active area of research (see Stuart Lee's plyranges, for example). I hope you can find a reasonable compromise that enables you to integrate ttBulk into Bioconductor, so that it can take advantage of the synergies the ecosystem provides. PS: There is no simple fix for your example. Michael
On Thu, Feb 6, 2020 at 4:12 PM stefano <mangiolastefano at gmail.com> wrote:
Thanks a lot for your comment Martin and Michael, Here I reply to Marti's comment. Michael I will try to implement your solution! I think a key point from https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106 (that I was under-looking) is *>> "So to sum up: if you submit a package to Bioconductor, there is an expectation that your package can work seamlessly with other Bioconductor packages, and your implementation should support that. The safest and easiest way to do that is to use Bioconductor data structures"* In this case my package would not be suited as I do not use pre-existing Bioconductor data structures, but instead i see value in using a simple tibble, for the reasons in part explained in the README https://github.com/stemangiola/ttBulk (harvesting the power of tidyverse and friends for bulk transcriptomic analyses). *>> "with the minimum standard of being able to accept such objects even if you do not rely on them internally (though you should)"* With this I can comply in the sense that I can built converters to and from SummarizedExperiment (for example). * >> "If you don't want to do that, then that's a shame, but it would suggest that Bioconductor would not be the right place to host this package."* Well said. In summary, I do not rely on Bioconductor data structure, as I am proposing another paradigm, but my back end is made of largely Bioconductor analysis packages that I would like to interface with tidyverse. So 1) Should I build converters to Bioc. data structures, and force the use of S3 object (needed to tiidyverse to work), or 2) Submit to CRAN I don't have strong feeling for either, although I think Bioconductor would be a good fit. Please community give me your honest opinions, I will take them seriously and proceed. Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan < mtmorgan.bioc at gmail.com> ha scritto:
The idea isn't to use S4 at any cost, but to 'play well' with the Bioconductor ecosystem, including writing robust and maintainable code. This comment https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106 provides some motivation; there was also an interesting exchange on the Bioconductor community slack about this (join at https://bioc-community.herokuapp.com/; discussion starting with https://community-bioc.slack.com/archives/C35G93GJH/p1580144746014800). The plyranges package http://bioconductor.org/packages/plyranges and recently accepted fluentGenomics workflow https://github.com/Bioconductor/Contributions/issues/1350 provide illustrations. In your domain it's really surprising that your package does not use (Import or Depend on) SummarizedExperiment or GenomicRanges packages. From a superficial look at your package, it seems like something like `reduce_dimensions()` could be defined to take & return a SummarizedExperiment and hence benefit from some of the points in the github issue comment mentioned above. Certainly there is a useful transition, both 'on the way in' to a SummarizedExperiment, and after leaving the more specialized bioinformatic computations to, e.g., display a pairs plot of the reduced dimensions, where one might re-shape the data to a tidy format and use 'plain old' tibbles; the fluentGenomics workflow might provide some guidance. At the end of the day it would not be surprising for Bioconductor packages to make use of tidy concepts and data structures, particularly in the vignette, and it would be a mistake for Bioconductor to exclude well-motivated 'tidy' representations. Martin Morgan ?On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" < bioc-devel-bounces at r-project.org on behalf of mangiolastefano at gmail.com> wrote: Hello, I have a package (ttBulk) under review. I have been told to replace the S3 system to S4. My package is based on the class tbl_df and must be fully compatible with tidyverse methods (inheritance). After some tests and research I understood that tidyverse ecosystem is not compatible with S4 classes. For example, several methos do not apparently handle S4 objects based on S3 tbl_df ```library(tidyverse)setOldClass("tbl_df") setClass("test2", contains = "tbl_df") my <- new("test2", tibble(a = 1)) my %>% mutate(b = 3) a b 1 1 3 ``` ```my <- new("test2", tibble(a = rnorm(100), b = 1)) my %>% nest(data = -b) Error: `x` must be a vector, not a `test2` object Run `rlang::last_error()` to see where the error occurred. ``` Could you please advise whether a tidyverse based package can be hosted on Bioconductor, and if S4 classes are really mandatory? I need to understand if I am forced to submit to CRAN instead (although Bioconductor would be a good fit, sice I try to interface transcriptional analysis tools to tidy universe) Thanks a lot. Stefano [[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
Thanks Michael, yes in a sense, ttBulk and SummariseExperiment can be considere as two interfaces. Would be fair enough to create a function that convert from one to the other, although the default would be ttBulk? *> I'm not sure the tidyverse is a great answer to the user interface, because it lacks domain semantics * Would be fair to say that ttBulk class could be considered a tibble with specific semantics? In the sense that it holds information about key column names (.sample, .transcript, .abundance, .normalised_abundance, etc..), and has a validator (that is triggered at every ttBulk function). I think at the moment, given (i) S3 problem, and (ii) the lack of formal foundation on SummaisedExperiment interface (that maybe would require an S4 technology itself, where SummariseExperiment could be a slot?) my package would belong more to CRAN, until those two issues will have been resolved. I imagine there are not many cases where a CRAN package migrated to Bioconductor after complying with the ecosystem policies. Thanks a lot. Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 12:12 Michael Lawrence < lawrence.michael at gene.com> ha scritto:
There's a difference between implementing software, where one wants formal data structures, and providing a convenient user interface. Software needs to interface with other software, so a package could provide both types of interfaces, one based on rich (S4) data structures, another on simpler structures with an API more amenable to analysis. I'm not sure the tidyverse is a great answer to the user interface, because it lacks domain semantics. This is still an active area of research (see Stuart Lee's plyranges, for example). I hope you can find a reasonable compromise that enables you to integrate ttBulk into Bioconductor, so that it can take advantage of the synergies the ecosystem provides. PS: There is no simple fix for your example. Michael On Thu, Feb 6, 2020 at 4:12 PM stefano <mangiolastefano at gmail.com> wrote:
Thanks a lot for your comment Martin and Michael, Here I reply to Marti's comment. Michael I will try to implement your solution! I think a key point from
(that I was under-looking) is *>> "So to sum up: if you submit a package to Bioconductor, there is an expectation that your package can work seamlessly with other Bioconductor packages, and your implementation should support that. The safest and easiest way to do that is to use Bioconductor data structures"* In this case my package would not be suited as I do not use pre-existing Bioconductor data structures, but instead i see value in using a simple tibble, for the reasons in part explained in the README https://github.com/stemangiola/ttBulk (harvesting the power of tidyverse and friends for bulk transcriptomic analyses). *>> "with the minimum standard of being able to accept such objects even
if
you do not rely on them internally (though you should)"* With this I can comply in the sense that I can built converters to and
from
SummarizedExperiment (for example). * >> "If you don't want to do that, then that's a shame, but it would suggest that Bioconductor would not be the right place to host this package."* Well said. In summary, I do not rely on Bioconductor data structure, as I am
proposing
another paradigm, but my back end is made of largely Bioconductor
analysis
packages that I would like to interface with tidyverse. So 1) Should I build converters to Bioc. data structures, and force the use
of
S3 object (needed to tiidyverse to work), or 2) Submit to CRAN I don't have strong feeling for either, although I think Bioconductor
would
be a good fit. Please community give me your honest opinions, I will take them seriously and proceed. Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan < mtmorgan.bioc at gmail.com> ha scritto:
The idea isn't to use S4 at any cost, but to 'play well' with the Bioconductor ecosystem, including writing robust and maintainable code. This comment
provides some motivation; there was also an interesting exchange on the Bioconductor community slack about this (join at https://bioc-community.herokuapp.com/; discussion starting with https://community-bioc.slack.com/archives/C35G93GJH/p1580144746014800
).
The plyranges package http://bioconductor.org/packages/plyranges and recently accepted fluentGenomics workflow https://github.com/Bioconductor/Contributions/issues/1350 provide illustrations. In your domain it's really surprising that your package does not use (Import or Depend on) SummarizedExperiment or GenomicRanges packages.
From
a superficial look at your package, it seems like something like `reduce_dimensions()` could be defined to take & return a SummarizedExperiment and hence benefit from some of the points in the github issue comment mentioned above. Certainly there is a useful transition, both 'on the way in' to a SummarizedExperiment, and after leaving the more specialized
bioinformatic
computations to, e.g., display a pairs plot of the reduced dimensions, where one might re-shape the data to a tidy format and use 'plain old' tibbles; the fluentGenomics workflow might provide some guidance. At the end of the day it would not be surprising for Bioconductor
packages
to make use of tidy concepts and data structures, particularly in the vignette, and it would be a mistake for Bioconductor to exclude well-motivated 'tidy' representations. Martin Morgan ?On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" < bioc-devel-bounces at r-project.org on behalf of
mangiolastefano at gmail.com>
wrote:
Hello,
I have a package (ttBulk) under review. I have been told to replace
the S3
system to S4. My package is based on the class tbl_df and must be
fully
compatible with tidyverse methods (inheritance). After some tests
and
research I understood that tidyverse ecosystem is not compatible
with
S4
classes.
For example, several methos do not apparently handle S4 objects
based
on
S3 tbl_df
```library(tidyverse)setOldClass("tbl_df")
setClass("test2", contains = "tbl_df")
my <- new("test2", tibble(a = 1))
my %>% mutate(b = 3)
a b
1 1 3
```
```my <- new("test2", tibble(a = rnorm(100), b = 1))
my %>% nest(data = -b)
Error: `x` must be a vector, not a `test2` object
Run `rlang::last_error()` to see where the error occurred.
```
Could you please advise whether a tidyverse based package can be
hosted on
Bioconductor, and if S4 classes are really mandatory? I need to
understand
if I am forced to submit to CRAN instead (although Bioconductor
would
be a
good fit, sice I try to interface transcriptional analysis tools to
tidy
universe)
Thanks a lot.
Stefano
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
I would urge you to make the package _directly_ compatible with standard Bioconductor data structures; no explicit conversion. But you can create wrapper methods (even on an S3 generic) that perform the conversion automatically. You'll probably want two separate APIs though (in different styles), for one thing automatic conversion is obviously not possible for return values. Michael
On Thu, Feb 6, 2020 at 5:34 PM stefano <mangiolastefano at gmail.com> wrote:
Thanks Michael, yes in a sense, ttBulk and SummariseExperiment can be considere as two interfaces. Would be fair enough to create a function that convert from one to the other, although the default would be ttBulk?
I'm not sure the tidyverse is a great answer to the user interface, because it lacks domain semantics
Would be fair to say that ttBulk class could be considered a tibble with specific semantics? In the sense that it holds information about key column names (.sample, .transcript, .abundance, .normalised_abundance, etc..), and has a validator (that is triggered at every ttBulk function). I think at the moment, given (i) S3 problem, and (ii) the lack of formal foundation on SummaisedExperiment interface (that maybe would require an S4 technology itself, where SummariseExperiment could be a slot?) my package would belong more to CRAN, until those two issues will have been resolved. I imagine there are not many cases where a CRAN package migrated to Bioconductor after complying with the ecosystem policies. Thanks a lot. Best wishes. Stefano Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 12:12 Michael Lawrence <lawrence.michael at gene.com> ha scritto:
There's a difference between implementing software, where one wants formal data structures, and providing a convenient user interface. Software needs to interface with other software, so a package could provide both types of interfaces, one based on rich (S4) data structures, another on simpler structures with an API more amenable to analysis. I'm not sure the tidyverse is a great answer to the user interface, because it lacks domain semantics. This is still an active area of research (see Stuart Lee's plyranges, for example). I hope you can find a reasonable compromise that enables you to integrate ttBulk into Bioconductor, so that it can take advantage of the synergies the ecosystem provides. PS: There is no simple fix for your example. Michael On Thu, Feb 6, 2020 at 4:12 PM stefano <mangiolastefano at gmail.com> wrote:
Thanks a lot for your comment Martin and Michael, Here I reply to Marti's comment. Michael I will try to implement your solution! I think a key point from https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106 (that I was under-looking) is *>> "So to sum up: if you submit a package to Bioconductor, there is an expectation that your package can work seamlessly with other Bioconductor packages, and your implementation should support that. The safest and easiest way to do that is to use Bioconductor data structures"* In this case my package would not be suited as I do not use pre-existing Bioconductor data structures, but instead i see value in using a simple tibble, for the reasons in part explained in the README https://github.com/stemangiola/ttBulk (harvesting the power of tidyverse and friends for bulk transcriptomic analyses). *>> "with the minimum standard of being able to accept such objects even if you do not rely on them internally (though you should)"* With this I can comply in the sense that I can built converters to and from SummarizedExperiment (for example). * >> "If you don't want to do that, then that's a shame, but it would suggest that Bioconductor would not be the right place to host this package."* Well said. In summary, I do not rely on Bioconductor data structure, as I am proposing another paradigm, but my back end is made of largely Bioconductor analysis packages that I would like to interface with tidyverse. So 1) Should I build converters to Bioc. data structures, and force the use of S3 object (needed to tiidyverse to work), or 2) Submit to CRAN I don't have strong feeling for either, although I think Bioconductor would be a good fit. Please community give me your honest opinions, I will take them seriously and proceed. Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan < mtmorgan.bioc at gmail.com> ha scritto:
The idea isn't to use S4 at any cost, but to 'play well' with the Bioconductor ecosystem, including writing robust and maintainable code. This comment https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106 provides some motivation; there was also an interesting exchange on the Bioconductor community slack about this (join at https://bioc-community.herokuapp.com/; discussion starting with https://community-bioc.slack.com/archives/C35G93GJH/p1580144746014800). The plyranges package http://bioconductor.org/packages/plyranges and recently accepted fluentGenomics workflow https://github.com/Bioconductor/Contributions/issues/1350 provide illustrations. In your domain it's really surprising that your package does not use (Import or Depend on) SummarizedExperiment or GenomicRanges packages. From a superficial look at your package, it seems like something like `reduce_dimensions()` could be defined to take & return a SummarizedExperiment and hence benefit from some of the points in the github issue comment mentioned above. Certainly there is a useful transition, both 'on the way in' to a SummarizedExperiment, and after leaving the more specialized bioinformatic computations to, e.g., display a pairs plot of the reduced dimensions, where one might re-shape the data to a tidy format and use 'plain old' tibbles; the fluentGenomics workflow might provide some guidance. At the end of the day it would not be surprising for Bioconductor packages to make use of tidy concepts and data structures, particularly in the vignette, and it would be a mistake for Bioconductor to exclude well-motivated 'tidy' representations. Martin Morgan ?On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" < bioc-devel-bounces at r-project.org on behalf of mangiolastefano at gmail.com> wrote: Hello, I have a package (ttBulk) under review. I have been told to replace the S3 system to S4. My package is based on the class tbl_df and must be fully compatible with tidyverse methods (inheritance). After some tests and research I understood that tidyverse ecosystem is not compatible with S4 classes. For example, several methos do not apparently handle S4 objects based on S3 tbl_df ```library(tidyverse)setOldClass("tbl_df") setClass("test2", contains = "tbl_df") my <- new("test2", tibble(a = 1)) my %>% mutate(b = 3) a b 1 1 3 ``` ```my <- new("test2", tibble(a = rnorm(100), b = 1)) my %>% nest(data = -b) Error: `x` must be a vector, not a `test2` object Run `rlang::last_error()` to see where the error occurred. ``` Could you please advise whether a tidyverse based package can be hosted on Bioconductor, and if S4 classes are really mandatory? I need to understand if I am forced to submit to CRAN instead (although Bioconductor would be a good fit, sice I try to interface transcriptional analysis tools to tidy universe) Thanks a lot. Stefano [[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
Would this scenario satisfy " make the package _directly_ compatible with standard Bioconductor data structures" If an input is SummarizedExperiment return SummarizedExperiment, if the input is a tbl_df or ttBulk, return ttBulk (?) Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 16:15 Michael Lawrence < lawrence.michael at gene.com> ha scritto:
I would urge you to make the package _directly_ compatible with standard Bioconductor data structures; no explicit conversion. But you can create wrapper methods (even on an S3 generic) that perform the conversion automatically. You'll probably want two separate APIs though (in different styles), for one thing automatic conversion is obviously not possible for return values. Michael On Thu, Feb 6, 2020 at 5:34 PM stefano <mangiolastefano at gmail.com> wrote:
Thanks Michael, yes in a sense, ttBulk and SummariseExperiment can be considere as two
interfaces. Would be fair enough to create a function that convert from one to the other, although the default would be ttBulk?
I'm not sure the tidyverse is a great answer to the user interface,
because it lacks domain semantics
Would be fair to say that ttBulk class could be considered a tibble with
specific semantics? In the sense that it holds information about key column names (.sample, .transcript, .abundance, .normalised_abundance, etc..), and has a validator (that is triggered at every ttBulk function).
I think at the moment, given (i) S3 problem, and (ii) the lack of formal
foundation on SummaisedExperiment interface (that maybe would require an S4 technology itself, where SummariseExperiment could be a slot?) my package would belong more to CRAN, until those two issues will have been resolved.
I imagine there are not many cases where a CRAN package migrated to
Bioconductor after complying with the ecosystem policies.
Thanks a lot. Best wishes. Stefano Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 12:12 Michael Lawrence <
lawrence.michael at gene.com> ha scritto:
There's a difference between implementing software, where one wants formal data structures, and providing a convenient user interface. Software needs to interface with other software, so a package could provide both types of interfaces, one based on rich (S4) data structures, another on simpler structures with an API more amenable to analysis. I'm not sure the tidyverse is a great answer to the user interface, because it lacks domain semantics. This is still an active area of research (see Stuart Lee's plyranges, for example). I hope you can find a reasonable compromise that enables you to integrate ttBulk into Bioconductor, so that it can take advantage of the synergies the ecosystem provides. PS: There is no simple fix for your example. Michael On Thu, Feb 6, 2020 at 4:12 PM stefano <mangiolastefano at gmail.com>
wrote:
Thanks a lot for your comment Martin and Michael, Here I reply to Marti's comment. Michael I will try to implement your solution! I think a key point from
(that I was under-looking) is *>> "So to sum up: if you submit a package to Bioconductor, there is
an
expectation that your package can work seamlessly with other
Bioconductor
packages, and your implementation should support that. The safest and easiest way to do that is to use Bioconductor data structures"* In this case my package would not be suited as I do not use
pre-existing
Bioconductor data structures, but instead i see value in using a
simple
tibble, for the reasons in part explained in the README https://github.com/stemangiola/ttBulk (harvesting the power of
tidyverse
and friends for bulk transcriptomic analyses). *>> "with the minimum standard of being able to accept such objects
even if
you do not rely on them internally (though you should)"* With this I can comply in the sense that I can built converters to
and from
SummarizedExperiment (for example). * >> "If you don't want to do that, then that's a shame, but it would suggest that Bioconductor would not be the right place to host this package."* Well said. In summary, I do not rely on Bioconductor data structure, as I am
proposing
another paradigm, but my back end is made of largely Bioconductor
analysis
packages that I would like to interface with tidyverse. So 1) Should I build converters to Bioc. data structures, and force the
use of
S3 object (needed to tiidyverse to work), or 2) Submit to CRAN I don't have strong feeling for either, although I think Bioconductor
would
be a good fit. Please community give me your honest opinions, I will
take
them seriously and proceed. Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan < mtmorgan.bioc at gmail.com> ha scritto:
The idea isn't to use S4 at any cost, but to 'play well' with the Bioconductor ecosystem, including writing robust and maintainable
code.
This comment
provides some motivation; there was also an interesting exchange on
the
Bioconductor community slack about this (join at https://bioc-community.herokuapp.com/; discussion starting with
The plyranges package http://bioconductor.org/packages/plyranges
and
recently accepted fluentGenomics workflow https://github.com/Bioconductor/Contributions/issues/1350 provide illustrations. In your domain it's really surprising that your package does not use (Import or Depend on) SummarizedExperiment or GenomicRanges
packages. From
a superficial look at your package, it seems like something like `reduce_dimensions()` could be defined to take & return a SummarizedExperiment and hence benefit from some of the points in
the
github issue comment mentioned above. Certainly there is a useful transition, both 'on the way in' to a SummarizedExperiment, and after leaving the more specialized
bioinformatic
computations to, e.g., display a pairs plot of the reduced
dimensions,
where one might re-shape the data to a tidy format and use 'plain
old'
tibbles; the fluentGenomics workflow might provide some guidance. At the end of the day it would not be surprising for Bioconductor
packages
to make use of tidy concepts and data structures, particularly in
the
vignette, and it would be a mistake for Bioconductor to exclude well-motivated 'tidy' representations. Martin Morgan ?On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" < bioc-devel-bounces at r-project.org on behalf of
mangiolastefano at gmail.com>
wrote:
Hello,
I have a package (ttBulk) under review. I have been told to
replace
the S3
system to S4. My package is based on the class tbl_df and must
be fully
compatible with tidyverse methods (inheritance). After some
tests and
research I understood that tidyverse ecosystem is not
compatible with
S4
classes.
For example, several methos do not apparently handle S4
objects based
on
S3 tbl_df
```library(tidyverse)setOldClass("tbl_df")
setClass("test2", contains = "tbl_df")
my <- new("test2", tibble(a = 1))
my %>% mutate(b = 3)
a b
1 1 3
```
```my <- new("test2", tibble(a = rnorm(100), b = 1))
my %>% nest(data = -b)
Error: `x` must be a vector, not a `test2` object
Run `rlang::last_error()` to see where the error occurred.
```
Could you please advise whether a tidyverse based package can be
hosted on
Bioconductor, and if S4 classes are really mandatory? I need to
understand
if I am forced to submit to CRAN instead (although Bioconductor
would
be a
good fit, sice I try to interface transcriptional analysis
tools to
tidy
universe)
Thanks a lot.
Stefano
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
-- Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
yes, absolutely. A common pattern might be to implement a generic
setGeneric("foo", function(x, ...) standardGeneric("foo"))
an ?internal? function that implements the method on base R data types
.foo <- function(x) {
stopifnot("'x' must be a matrix" = is.matrix(x))
t(x)
}
and methods that act as a facade to the implementation
setMethod("foo", "tbl_df", function(x) {
x <- as.matrix(x)
result <- .foo(x)
as_tibble(result)
})
setMethod("foo", "SummarizedExperiment", function(x) {
result <- .foo(assay(x))
assays(x)[["foo"]] <- result
x
})
One would expect the vignette and examples to primarily emphasize the use of the interoperable (SummmarizedExperiment) version.
Martin Morgan
From: stefano <mangiolastefano at gmail.com>
Date: Friday, February 7, 2020 at 12:31 AM
To: Michael Lawrence <lawrence.michael at gene.com>
Cc: Martin Morgan <mtmorgan.bioc at gmail.com>, "bioc-devel at r-project.org" <bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods
Would this scenario satisfy " make the package _directly_ compatible with standard Bioconductor data structures"
If an input is SummarizedExperiment return SummarizedExperiment, if the input is a tbl_df or ttBulk, return ttBulk (?)
Best wishes.
Stefano
?
Stefano Mangiola | Postdoctoral fellow
Papenfuss Laboratory
The Walter Eliza Hall Institute of Medical Research
+61 (0)466452544
Il giorno ven 7 feb 2020 alle ore 16:15 Michael Lawrence <mailto:lawrence.michael at gene.com> ha scritto:
I would urge you to make the package _directly_ compatible with
standard Bioconductor data structures; no explicit conversion. But you
can create wrapper methods (even on an S3 generic) that perform the
conversion automatically. You'll probably want two separate APIs
though (in different styles), for one thing automatic conversion is
obviously not possible for return values.
Michael
On Thu, Feb 6, 2020 at 5:34 PM stefano <mailto:mangiolastefano at gmail.com> wrote:
Thanks Michael, yes in a sense, ttBulk and SummariseExperiment can be considere as two interfaces. Would be fair enough to create a function that convert from one to the other, although the default would be ttBulk?
I'm not sure the tidyverse is a great answer to the user interface, because it lacks domain semantics
Would be fair to say that ttBulk class could be considered a tibble with specific semantics? In the sense that it holds information about key column names (.sample, .transcript, .abundance, .normalised_abundance, etc..), and has a validator (that is triggered at every ttBulk function). I think at the moment, given (i) S3 problem, and (ii) the lack of formal foundation on SummaisedExperiment interface (that maybe would require an S4 technology itself, where SummariseExperiment could be a slot?) my package would belong more to CRAN, until those two issues will have been resolved. I imagine there are not many cases where a CRAN package migrated to Bioconductor after complying with the ecosystem policies. Thanks a lot. Best wishes. Stefano Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 12:12 Michael Lawrence <mailto:lawrence.michael at gene.com> ha scritto:
There's a difference between implementing software, where one wants formal data structures, and providing a convenient user interface. Software needs to interface with other software, so a package could provide both types of interfaces, one based on rich (S4) data structures, another on simpler structures with an API more amenable to analysis. I'm not sure the tidyverse is a great answer to the user interface, because it lacks domain semantics. This is still an active area of research (see Stuart Lee's plyranges, for example). I hope you can find a reasonable compromise that enables you to integrate ttBulk into Bioconductor, so that it can take advantage of the synergies the ecosystem provides. PS: There is no simple fix for your example. Michael On Thu, Feb 6, 2020 at 4:12 PM stefano <mailto:mangiolastefano at gmail.com> wrote:
Thanks a lot for your comment Martin and Michael, Here I reply to Marti's comment. Michael I will try to implement your solution! I think a key point from https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106 (that I was under-looking) is *>> "So to sum up: if you submit a package to Bioconductor, there is an expectation that your package can work seamlessly with other Bioconductor packages, and your implementation should support that. The safest and easiest way to do that is to use Bioconductor data structures"* In this case my package would not be suited as I do not use pre-existing Bioconductor data structures, but instead i see value in using a simple tibble, for the reasons in part explained in the README https://github.com/stemangiola/ttBulk (harvesting the power of tidyverse and friends for bulk transcriptomic analyses). *>> "with the minimum standard of being able to accept such objects even if you do not rely on them internally (though you should)"* With this I can comply in the sense that I can built converters to and from SummarizedExperiment (for example). * >> "If you don't want to do that, then that's a shame, but it would suggest that Bioconductor would not be the right place to host this package."* Well said. In summary, I do not rely on Bioconductor data structure, as I am proposing another paradigm, but my back end is made of largely Bioconductor analysis packages that I would like to interface with tidyverse. So 1) Should I build converters to Bioc. data structures, and force the use of S3 object (needed to tiidyverse to work), or 2) Submit to CRAN I don't have strong feeling for either, although I think Bioconductor would be a good fit. Please community give me your honest opinions, I will take them seriously and proceed. Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan < mailto:mtmorgan.bioc at gmail.com> ha scritto:
The idea isn't to use S4 at any cost, but to 'play well' with the Bioconductor ecosystem, including writing robust and maintainable code. This comment https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106 provides some motivation; there was also an interesting exchange on the Bioconductor community slack about this (join at https://bioc-community.herokuapp.com/; discussion starting with https://community-bioc.slack.com/archives/C35G93GJH/p1580144746014800). The plyranges package http://bioconductor.org/packages/plyranges and recently accepted fluentGenomics workflow https://github.com/Bioconductor/Contributions/issues/1350 provide illustrations. In your domain it's really surprising that your package does not use (Import or Depend on) SummarizedExperiment or GenomicRanges packages. From a superficial look at your package, it seems like something like `reduce_dimensions()` could be defined to take & return a SummarizedExperiment and hence benefit from some of the points in the github issue comment mentioned above. Certainly there is a useful transition, both 'on the way in' to a SummarizedExperiment, and after leaving the more specialized bioinformatic computations to, e.g., display a pairs plot of the reduced dimensions, where one might re-shape the data to a tidy format and use 'plain old' tibbles; the fluentGenomics workflow might provide some guidance. At the end of the day it would not be surprising for Bioconductor packages to make use of tidy concepts and data structures, particularly in the vignette, and it would be a mistake for Bioconductor to exclude well-motivated 'tidy' representations. Martin Morgan ?On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" < mailto:bioc-devel-bounces at r-project.org on behalf of mailto:mangiolastefano at gmail.com> wrote: ? ? ?Hello, ? ? ?I have a package (ttBulk) under review. I have been told to replace the S3 ? ? ?system to S4. My package is based on the class tbl_df and must be fully ? ? ?compatible with tidyverse methods (inheritance). After some tests and ? ? ?research I understood that tidyverse ecosystem is not compatible with S4 ? ? ?classes. ? ? ? For example, several methos do not apparently handle S4 objects based on ? ? ?S3 tbl_df ? ? ?```library(tidyverse)setOldClass("tbl_df") ? ? ?setClass("test2", contains = "tbl_df") ? ? ?my <- new("test2",? tibble(a = 1)) ? ? ?my %>%? mutate(b = 3) ? ? ? ? a b ? ? ?1 1 3 ? ? ?``` ? ? ? ```my <- new("test2",? tibble(a = rnorm(100), b = 1)) ? ? ?my %>% nest(data = -b) ? ? ?Error: `x` must be a vector, not a `test2` object ? ? ?Run `rlang::last_error()` to see where the error occurred. ? ? ?``` ? ? ?Could you please advise whether a tidyverse based package can be hosted on ? ? ?Bioconductor, and if S4 classes are really mandatory? I need to understand ? ? ?if I am forced to submit to CRAN instead (although Bioconductor would be a ? ? ?good fit, sice I try to interface transcriptional analysis tools to tidy ? ? ?universe) ? ? ?Thanks a lot. ? ? ?Stefano ? ? ? ? ?[[alternative HTML version deleted]] ? ? ?_______________________________________________ ? ? ?mailto:Bioc-devel at r-project.org mailing list ? ? ?https://stat.ethz.ch/mailman/listinfo/bioc-devel
? ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ mailto:Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 mailto:michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 mailto:michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
This is an interesting discussion and I hope it is ok to continue it a bit. I found the readme for the ttBulk repo extremely enticing and I am sure many people will want to explore this way of working with genomic data. I have only a few moments to explore it and did not read the vignette, but it looks to me as if it is mostly recapitulated in the README, which is an excellent overview. One thing I feel is missing is an approach to the following question: I like the idea of a pipe-oriented operator for programming steps in genomic workflows. How do I make one that works the way ttBulk's operators work? Well, I can have a look at ttBulk:::reduce_dimensions.ttBulk ... It's involved. Are there patterns there that are preserved across different operators? Can they be factored out to improve maintainability? One other point before I run It seems to me the operators "require" that certain fields be defined in their tibble operands.
names(attributes(counts))
[1] "names" "class" "row.names" "parameters"
attributes(counts)$names
[1] "sample" "transcript" "Cell type" [4] "count" "time" "condition" [7] "batch" "factor_of_interest"
validObject(counts)
*Error in .classEnv(classDef) : *
* trying to get slot "package" from an object of a basic class ("NULL")
with no slots*
Enter a frame number, or 0 to exit
1: validObject(counts)
2: .classEnv(classDef)
I think you mentioned validity checking in a previous email. This
is a feature of S4 that is not frequently invoked. Of course
validObject will not work on counts, but do you have something similar?
(Not all working S4 objects from Bioc will pass validObject tests, but
they should....)
On Fri, Feb 7, 2020 at 5:26 AM Martin Morgan <mtmorgan.bioc at gmail.com>
wrote:
yes, absolutely. A common pattern might be to implement a generic
setGeneric("foo", function(x, ...) standardGeneric("foo"))
an ?internal? function that implements the method on base R data types
.foo <- function(x) {
stopifnot("'x' must be a matrix" = is.matrix(x))
t(x)
}
and methods that act as a facade to the implementation
setMethod("foo", "tbl_df", function(x) {
x <- as.matrix(x)
result <- .foo(x)
as_tibble(result)
})
setMethod("foo", "SummarizedExperiment", function(x) {
result <- .foo(assay(x))
assays(x)[["foo"]] <- result
x
})
One would expect the vignette and examples to primarily emphasize the use
of the interoperable (SummmarizedExperiment) version.
Martin Morgan
From: stefano <mangiolastefano at gmail.com>
Date: Friday, February 7, 2020 at 12:31 AM
To: Michael Lawrence <lawrence.michael at gene.com>
Cc: Martin Morgan <mtmorgan.bioc at gmail.com>, "bioc-devel at r-project.org" <
bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3
classes/methods
Would this scenario satisfy " make the package _directly_ compatible with
standard Bioconductor data structures"
If an input is SummarizedExperiment return SummarizedExperiment, if the
input is a tbl_df or ttBulk, return ttBulk (?)
Best wishes.
Stefano
Stefano Mangiola | Postdoctoral fellow
Papenfuss Laboratory
The Walter Eliza Hall Institute of Medical Research
+61 (0)466452544
Il giorno ven 7 feb 2020 alle ore 16:15 Michael Lawrence <mailto:
lawrence.michael at gene.com> ha scritto:
I would urge you to make the package _directly_ compatible with
standard Bioconductor data structures; no explicit conversion. But you
can create wrapper methods (even on an S3 generic) that perform the
conversion automatically. You'll probably want two separate APIs
though (in different styles), for one thing automatic conversion is
obviously not possible for return values.
Michael
On Thu, Feb 6, 2020 at 5:34 PM stefano <mailto:mangiolastefano at gmail.com>
wrote:
Thanks Michael, yes in a sense, ttBulk and SummariseExperiment can be considere as two
interfaces. Would be fair enough to create a function that convert from one to the other, although the default would be ttBulk?
I'm not sure the tidyverse is a great answer to the user interface,
because it lacks domain semantics
Would be fair to say that ttBulk class could be considered a tibble with
specific semantics? In the sense that it holds information about key column names (.sample, .transcript, .abundance, .normalised_abundance, etc..), and has a validator (that is triggered at every ttBulk function).
I think at the moment, given (i) S3 problem, and (ii) the lack of formal
foundation on SummaisedExperiment interface (that maybe would require an S4 technology itself, where SummariseExperiment could be a slot?) my package would belong more to CRAN, until those two issues will have been resolved.
I imagine there are not many cases where a CRAN package migrated to
Bioconductor after complying with the ecosystem policies.
Thanks a lot. Best wishes. Stefano Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 12:12 Michael Lawrence <mailto:
lawrence.michael at gene.com> ha scritto:
There's a difference between implementing software, where one wants formal data structures, and providing a convenient user interface. Software needs to interface with other software, so a package could provide both types of interfaces, one based on rich (S4) data structures, another on simpler structures with an API more amenable to analysis. I'm not sure the tidyverse is a great answer to the user interface, because it lacks domain semantics. This is still an active area of research (see Stuart Lee's plyranges, for example). I hope you can find a reasonable compromise that enables you to integrate ttBulk into Bioconductor, so that it can take advantage of the synergies the ecosystem provides. PS: There is no simple fix for your example. Michael On Thu, Feb 6, 2020 at 4:12 PM stefano <mailto:
mangiolastefano at gmail.com> wrote:
Thanks a lot for your comment Martin and Michael, Here I reply to Marti's comment. Michael I will try to implement your solution! I think a key point from
(that I was under-looking) is *>> "So to sum up: if you submit a package to Bioconductor, there is
an
expectation that your package can work seamlessly with other
Bioconductor
packages, and your implementation should support that. The safest and easiest way to do that is to use Bioconductor data structures"* In this case my package would not be suited as I do not use
pre-existing
Bioconductor data structures, but instead i see value in using a
simple
tibble, for the reasons in part explained in the README https://github.com/stemangiola/ttBulk (harvesting the power of
tidyverse
and friends for bulk transcriptomic analyses). *>> "with the minimum standard of being able to accept such objects
even if
you do not rely on them internally (though you should)"* With this I can comply in the sense that I can built converters to
and from
SummarizedExperiment (for example). * >> "If you don't want to do that, then that's a shame, but it would suggest that Bioconductor would not be the right place to host this package."* Well said. In summary, I do not rely on Bioconductor data structure, as I am
proposing
another paradigm, but my back end is made of largely Bioconductor
analysis
packages that I would like to interface with tidyverse. So 1) Should I build converters to Bioc. data structures, and force the
use of
S3 object (needed to tiidyverse to work), or 2) Submit to CRAN I don't have strong feeling for either, although I think Bioconductor
would
be a good fit. Please community give me your honest opinions, I will
take
them seriously and proceed. Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan < mailto:mtmorgan.bioc at gmail.com> ha scritto:
The idea isn't to use S4 at any cost, but to 'play well' with the Bioconductor ecosystem, including writing robust and maintainable
code.
This comment
provides some motivation; there was also an interesting exchange on
the
Bioconductor community slack about this (join at https://bioc-community.herokuapp.com/; discussion starting with
The plyranges package http://bioconductor.org/packages/plyranges
and
recently accepted fluentGenomics workflow https://github.com/Bioconductor/Contributions/issues/1350 provide illustrations. In your domain it's really surprising that your package does not use (Import or Depend on) SummarizedExperiment or GenomicRanges
packages. From
a superficial look at your package, it seems like something like `reduce_dimensions()` could be defined to take & return a SummarizedExperiment and hence benefit from some of the points in
the
github issue comment mentioned above. Certainly there is a useful transition, both 'on the way in' to a SummarizedExperiment, and after leaving the more specialized
bioinformatic
computations to, e.g., display a pairs plot of the reduced
dimensions,
where one might re-shape the data to a tidy format and use 'plain
old'
tibbles; the fluentGenomics workflow might provide some guidance. At the end of the day it would not be surprising for Bioconductor
packages
to make use of tidy concepts and data structures, particularly in
the
vignette, and it would be a mistake for Bioconductor to exclude well-motivated 'tidy' representations. Martin Morgan ?On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" < mailto:bioc-devel-bounces at r-project.org on behalf of mailto:
mangiolastefano at gmail.com>
wrote:
Hello,
I have a package (ttBulk) under review. I have been told to
replace
the S3
system to S4. My package is based on the class tbl_df and must
be fully
compatible with tidyverse methods (inheritance). After some
tests and
research I understood that tidyverse ecosystem is not
compatible with
S4
classes.
For example, several methos do not apparently handle S4
objects based
on
S3 tbl_df
```library(tidyverse)setOldClass("tbl_df")
setClass("test2", contains = "tbl_df")
my <- new("test2", tibble(a = 1))
my %>% mutate(b = 3)
a b
1 1 3
```
```my <- new("test2", tibble(a = rnorm(100), b = 1))
my %>% nest(data = -b)
Error: `x` must be a vector, not a `test2` object
Run `rlang::last_error()` to see where the error occurred.
```
Could you please advise whether a tidyverse based package can be
hosted on
Bioconductor, and if S4 classes are really mandatory? I need to
understand
if I am forced to submit to CRAN instead (although Bioconductor
would
be a
good fit, sice I try to interface transcriptional analysis
tools to
tidy
universe)
Thanks a lot.
Stefano
[[alternative HTML version deleted]]
_______________________________________________
mailto:Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ mailto:Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 mailto:michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
-- Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 mailto:michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
The information in this e-mail is intended only for the ...{{dropped:18}}
Thanks Guys for the discussion (I am learning a lot), *To Martin:* Thanks for the tips. I will start to implement those S4 style methods https://github.com/stemangiola/ttBulk/issues/7 I would *really *like to be part of Bioconductor community with this package, if just this
" One would expect the vignette and examples to primarily emphasize the
use of the interoperable (SummmarizedExperiment) version. " Could become this
One would expect the vignette and examples to emphasize the use of the
interoperable (SummmarizedExperiment) version. I agree with the integration priority of Bioconductor, but this repository (and this philosophy) is more than its data structures. There should be space for more than one approach to do things, given that the principle are respected. If this is true, I could really spend energies to use methods as you suggested and implement the SummarisedExperiment stream. And with the tips of the community the link will become stronger and stronger with time and versions. *To Vincent* Thanks a lot for the interest. *> One thing I feel is missing is an approach to the following question: [..] How do I make one that works the way ttBulk's operators work?* I'm afraid I don't really understand the question. Are you wondering about extension of the framework? Or creating a similar framework for other applications? Could you please reformulate, maybe giving a concrete example? *> Are there patterns there that are preserved across different operators? * A commonality is the use of code for integrating the new calculated information (dplyr), validation functions, .. *> Can they be factored out to improve maintainability?* Almost surely yes, this is the first version, I hope to see enough interest, improve the API upon feedback, and hope for (intellectual and practical) contributions from more experts in software engineering. *> validObject * Seems a good method, and as far as I tested works for S3 objects as well. I will try to implement it. In fact I already added it as issue into Github https://github.com/stemangiola/ttBulk/issues/6 At the moment I have a custom validation function Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno sab 8 feb 2020 alle ore 01:54 Vincent Carey < stvjc at channing.harvard.edu> ha scritto:
This is an interesting discussion and I hope it is ok to continue it a bit. I found the readme for the ttBulk repo extremely enticing and I am sure many people will want to explore this way of working with genomic data. I have only a few moments to explore it and did not read the vignette, but it looks to me as if it is mostly recapitulated in the README, which is an excellent overview. One thing I feel is missing is an approach to the following question: I like the idea of a pipe-oriented operator for programming steps in genomic workflows. How do I make one that works the way ttBulk's operators work? Well, I can have a look at ttBulk:::reduce_dimensions.ttBulk ... It's involved. Are there patterns there that are preserved across different operators? Can they be factored out to improve maintainability? One other point before I run It seems to me the operators "require" that certain fields be defined in their tibble operands.
names(attributes(counts))
[1] "names" "class" "row.names" "parameters"
attributes(counts)$names
[1] "sample" "transcript" "Cell type" [4] "count" "time" "condition" [7] "batch" "factor_of_interest"
validObject(counts)
*Error in .classEnv(classDef) : *
* trying to get slot "package" from an object of a basic class ("NULL")
with no slots*
Enter a frame number, or 0 to exit
1: validObject(counts)
2: .classEnv(classDef)
I think you mentioned validity checking in a previous email. This
is a feature of S4 that is not frequently invoked. Of course
validObject will not work on counts, but do you have something similar?
(Not all working S4 objects from Bioc will pass validObject tests, but
they should....)
On Fri, Feb 7, 2020 at 5:26 AM Martin Morgan <mtmorgan.bioc at gmail.com>
wrote:
yes, absolutely. A common pattern might be to implement a generic
setGeneric("foo", function(x, ...) standardGeneric("foo"))
an ?internal? function that implements the method on base R data types
.foo <- function(x) {
stopifnot("'x' must be a matrix" = is.matrix(x))
t(x)
}
and methods that act as a facade to the implementation
setMethod("foo", "tbl_df", function(x) {
x <- as.matrix(x)
result <- .foo(x)
as_tibble(result)
})
setMethod("foo", "SummarizedExperiment", function(x) {
result <- .foo(assay(x))
assays(x)[["foo"]] <- result
x
})
One would expect the vignette and examples to primarily emphasize the use
of the interoperable (SummmarizedExperiment) version.
Martin Morgan
From: stefano <mangiolastefano at gmail.com>
Date: Friday, February 7, 2020 at 12:31 AM
To: Michael Lawrence <lawrence.michael at gene.com>
Cc: Martin Morgan <mtmorgan.bioc at gmail.com>, "bioc-devel at r-project.org" <
bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3
classes/methods
Would this scenario satisfy " make the package _directly_ compatible with
standard Bioconductor data structures"
If an input is SummarizedExperiment return SummarizedExperiment, if the
input is a tbl_df or ttBulk, return ttBulk (?)
Best wishes.
Stefano
Stefano Mangiola | Postdoctoral fellow
Papenfuss Laboratory
The Walter Eliza Hall Institute of Medical Research
+61 (0)466452544
Il giorno ven 7 feb 2020 alle ore 16:15 Michael Lawrence <mailto:
lawrence.michael at gene.com> ha scritto:
I would urge you to make the package _directly_ compatible with
standard Bioconductor data structures; no explicit conversion. But you
can create wrapper methods (even on an S3 generic) that perform the
conversion automatically. You'll probably want two separate APIs
though (in different styles), for one thing automatic conversion is
obviously not possible for return values.
Michael
On Thu, Feb 6, 2020 at 5:34 PM stefano <mailto:mangiolastefano at gmail.com>
wrote:
Thanks Michael, yes in a sense, ttBulk and SummariseExperiment can be considere as two
interfaces. Would be fair enough to create a function that convert from one to the other, although the default would be ttBulk?
I'm not sure the tidyverse is a great answer to the user interface,
because it lacks domain semantics
Would be fair to say that ttBulk class could be considered a tibble
with specific semantics? In the sense that it holds information about key column names (.sample, .transcript, .abundance, .normalised_abundance, etc..), and has a validator (that is triggered at every ttBulk function).
I think at the moment, given (i) S3 problem, and (ii) the lack of
formal foundation on SummaisedExperiment interface (that maybe would require an S4 technology itself, where SummariseExperiment could be a slot?) my package would belong more to CRAN, until those two issues will have been resolved.
I imagine there are not many cases where a CRAN package migrated to
Bioconductor after complying with the ecosystem policies.
Thanks a lot. Best wishes. Stefano Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 12:12 Michael Lawrence <mailto:
lawrence.michael at gene.com> ha scritto:
There's a difference between implementing software, where one wants formal data structures, and providing a convenient user interface. Software needs to interface with other software, so a package could provide both types of interfaces, one based on rich (S4) data structures, another on simpler structures with an API more amenable to analysis. I'm not sure the tidyverse is a great answer to the user interface, because it lacks domain semantics. This is still an active area of research (see Stuart Lee's plyranges, for example). I hope you can find a reasonable compromise that enables you to integrate ttBulk into Bioconductor, so that it can take advantage of the synergies the ecosystem provides. PS: There is no simple fix for your example. Michael On Thu, Feb 6, 2020 at 4:12 PM stefano <mailto:
mangiolastefano at gmail.com> wrote:
Thanks a lot for your comment Martin and Michael, Here I reply to Marti's comment. Michael I will try to implement your solution! I think a key point from
(that I was under-looking) is *>> "So to sum up: if you submit a package to Bioconductor, there is
an
expectation that your package can work seamlessly with other
Bioconductor
packages, and your implementation should support that. The safest and easiest way to do that is to use Bioconductor data structures"* In this case my package would not be suited as I do not use
pre-existing
Bioconductor data structures, but instead i see value in using a
simple
tibble, for the reasons in part explained in the README https://github.com/stemangiola/ttBulk (harvesting the power of
tidyverse
and friends for bulk transcriptomic analyses). *>> "with the minimum standard of being able to accept such objects
even if
you do not rely on them internally (though you should)"* With this I can comply in the sense that I can built converters to
and from
SummarizedExperiment (for example). * >> "If you don't want to do that, then that's a shame, but it would suggest that Bioconductor would not be the right place to host this package."* Well said. In summary, I do not rely on Bioconductor data structure, as I am
proposing
another paradigm, but my back end is made of largely Bioconductor
analysis
packages that I would like to interface with tidyverse. So 1) Should I build converters to Bioc. data structures, and force the
use of
S3 object (needed to tiidyverse to work), or 2) Submit to CRAN I don't have strong feeling for either, although I think
Bioconductor would
be a good fit. Please community give me your honest opinions, I will
take
them seriously and proceed. Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan < mailto:mtmorgan.bioc at gmail.com> ha scritto:
The idea isn't to use S4 at any cost, but to 'play well' with the Bioconductor ecosystem, including writing robust and maintainable
code.
This comment
provides some motivation; there was also an interesting exchange
on the
Bioconductor community slack about this (join at https://bioc-community.herokuapp.com/; discussion starting with
The plyranges package http://bioconductor.org/packages/plyranges
and
recently accepted fluentGenomics workflow https://github.com/Bioconductor/Contributions/issues/1350 provide illustrations. In your domain it's really surprising that your package does not
use
(Import or Depend on) SummarizedExperiment or GenomicRanges
packages. From
a superficial look at your package, it seems like something like `reduce_dimensions()` could be defined to take & return a SummarizedExperiment and hence benefit from some of the points in
the
github issue comment mentioned above. Certainly there is a useful transition, both 'on the way in' to a SummarizedExperiment, and after leaving the more specialized
bioinformatic
computations to, e.g., display a pairs plot of the reduced
dimensions,
where one might re-shape the data to a tidy format and use 'plain
old'
tibbles; the fluentGenomics workflow might provide some guidance. At the end of the day it would not be surprising for Bioconductor
packages
to make use of tidy concepts and data structures, particularly in
the
vignette, and it would be a mistake for Bioconductor to exclude well-motivated 'tidy' representations. Martin Morgan ?On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" < mailto:bioc-devel-bounces at r-project.org on behalf of mailto:
mangiolastefano at gmail.com>
wrote:
Hello,
I have a package (ttBulk) under review. I have been told to
replace
the S3
system to S4. My package is based on the class tbl_df and must
be fully
compatible with tidyverse methods (inheritance). After some
tests and
research I understood that tidyverse ecosystem is not
compatible with
S4
classes.
For example, several methos do not apparently handle S4
objects based
on
S3 tbl_df
```library(tidyverse)setOldClass("tbl_df")
setClass("test2", contains = "tbl_df")
my <- new("test2", tibble(a = 1))
my %>% mutate(b = 3)
a b
1 1 3
```
```my <- new("test2", tibble(a = rnorm(100), b = 1))
my %>% nest(data = -b)
Error: `x` must be a vector, not a `test2` object
Run `rlang::last_error()` to see where the error occurred.
```
Could you please advise whether a tidyverse based package can
be
hosted on
Bioconductor, and if S4 classes are really mandatory? I need to
understand
if I am forced to submit to CRAN instead (although
Bioconductor would
be a
good fit, sice I try to interface transcriptional analysis
tools to
tidy
universe)
Thanks a lot.
Stefano
[[alternative HTML version deleted]]
_______________________________________________
mailto:Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ mailto:Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 mailto:michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
-- Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 mailto:michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
The information in this e-mail is intended only for th...{{dropped:16}}
On Fri, Feb 7, 2020 at 6:39 PM stefano <mangiolastefano at gmail.com> wrote:
Thanks Guys for the discussion (I am learning a lot), *To Martin:* Thanks for the tips. I will start to implement those S4 style methods https://github.com/stemangiola/ttBulk/issues/7 I would *really *like to be part of Bioconductor community with this package, if just this
" One would expect the vignette and examples to primarily emphasize the
use of the interoperable (SummmarizedExperiment) version. " Could become this
One would expect the vignette and examples to emphasize the use of the
interoperable (SummmarizedExperiment) version. I agree with the integration priority of Bioconductor, but this repository (and this philosophy) is more than its data structures. There should be space for more than one approach to do things, given that the principle are respected. If this is true, I could really spend energies to use methods as you suggested and implement the SummarisedExperiment stream. And with the tips of the community the link will become stronger and stronger with time and versions. *To Vincent* Thanks a lot for the interest. *> One thing I feel is missing is an approach to the following question: [..] How do I make one that works the way ttBulk's operators work?* I'm afraid I don't really understand the question. Are you wondering about extension of the framework? Or creating a similar framework for other applications? Could you please reformulate, maybe giving a concrete example?
We can take further discussion to the issues on the github repo but I will briefly respond here. Consider reduce_dimensions. You give a small number of method options here -- PCA, MDS, tSNE. The MDS option makes its way to stats::cmdscale via limma::plotMDS; the PCA option uses prcomp. For any number of reasons, users may want to select alternate dimension reduction procedures or tune them in ways not passed up through your interface. This might involve modifications to your code to introduce changes, or one could imagine a protocol for "dropping in" a new operator for ttBulk pipelines. My question is to understand how this level of flexibility might be achieved. An example of an R package that pursues this is mlr3, see https://github.com/mlr-org/mlr3learners.template ... a link there is broken but the full details of contributing new pipeline elements are at https://mlr3book.mlr-org.com/pipelines.html
*> Are there patterns there that are preserved across different operators? * A commonality is the use of code for integrating the new calculated information (dplyr), validation functions, .. *> Can they be factored out to improve maintainability?* Almost surely yes, this is the first version, I hope to see enough interest, improve the API upon feedback, and hope for (intellectual and practical) contributions from more experts in software engineering. *> validObject * Seems a good method, and as far as I tested works for S3 objects as well. I will try to implement it. In fact I already added it as issue into Github https://github.com/stemangiola/ttBulk/issues/6 At the moment I have a custom validation function Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno sab 8 feb 2020 alle ore 01:54 Vincent Carey < stvjc at channing.harvard.edu> ha scritto:
This is an interesting discussion and I hope it is ok to continue it a bit. I found the readme for the ttBulk repo extremely enticing and I am sure many people will want to explore this way of working with genomic data. I have only a few moments to explore it and did not read the vignette, but it looks to me as if it is mostly recapitulated in the README, which is an excellent overview. One thing I feel is missing is an approach to the following question: I like the idea of a pipe-oriented operator for programming steps in genomic workflows. How do I make one that works the way ttBulk's operators work? Well, I can have a look at ttBulk:::reduce_dimensions.ttBulk ... It's involved. Are there patterns there that are preserved across different operators? Can they be factored out to improve maintainability? One other point before I run It seems to me the operators "require" that certain fields be defined in their tibble operands.
names(attributes(counts))
[1] "names" "class" "row.names" "parameters"
attributes(counts)$names
[1] "sample" "transcript" "Cell type" [4] "count" "time" "condition" [7] "batch" "factor_of_interest"
validObject(counts)
*Error in .classEnv(classDef) : *
* trying to get slot "package" from an object of a basic class ("NULL")
with no slots*
Enter a frame number, or 0 to exit
1: validObject(counts)
2: .classEnv(classDef)
I think you mentioned validity checking in a previous email. This
is a feature of S4 that is not frequently invoked. Of course
validObject will not work on counts, but do you have something similar?
(Not all working S4 objects from Bioc will pass validObject tests, but
they should....)
On Fri, Feb 7, 2020 at 5:26 AM Martin Morgan <mtmorgan.bioc at gmail.com>
wrote:
yes, absolutely. A common pattern might be to implement a generic
setGeneric("foo", function(x, ...) standardGeneric("foo"))
an ?internal? function that implements the method on base R data types
.foo <- function(x) {
stopifnot("'x' must be a matrix" = is.matrix(x))
t(x)
}
and methods that act as a facade to the implementation
setMethod("foo", "tbl_df", function(x) {
x <- as.matrix(x)
result <- .foo(x)
as_tibble(result)
})
setMethod("foo", "SummarizedExperiment", function(x) {
result <- .foo(assay(x))
assays(x)[["foo"]] <- result
x
})
One would expect the vignette and examples to primarily emphasize the
use of the interoperable (SummmarizedExperiment) version.
Martin Morgan
From: stefano <mangiolastefano at gmail.com>
Date: Friday, February 7, 2020 at 12:31 AM
To: Michael Lawrence <lawrence.michael at gene.com>
Cc: Martin Morgan <mtmorgan.bioc at gmail.com>, "bioc-devel at r-project.org"
<bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse
S3 classes/methods
Would this scenario satisfy " make the package _directly_ compatible
with standard Bioconductor data structures"
If an input is SummarizedExperiment return SummarizedExperiment, if the
input is a tbl_df or ttBulk, return ttBulk (?)
Best wishes.
Stefano
Stefano Mangiola | Postdoctoral fellow
Papenfuss Laboratory
The Walter Eliza Hall Institute of Medical Research
+61 (0)466452544
Il giorno ven 7 feb 2020 alle ore 16:15 Michael Lawrence <mailto:
lawrence.michael at gene.com> ha scritto:
I would urge you to make the package _directly_ compatible with
standard Bioconductor data structures; no explicit conversion. But you
can create wrapper methods (even on an S3 generic) that perform the
conversion automatically. You'll probably want two separate APIs
though (in different styles), for one thing automatic conversion is
obviously not possible for return values.
Michael
On Thu, Feb 6, 2020 at 5:34 PM stefano <mailto:mangiolastefano at gmail.com>
wrote:
Thanks Michael, yes in a sense, ttBulk and SummariseExperiment can be considere as two
interfaces. Would be fair enough to create a function that convert from one to the other, although the default would be ttBulk?
I'm not sure the tidyverse is a great answer to the user interface,
because it lacks domain semantics
Would be fair to say that ttBulk class could be considered a tibble
with specific semantics? In the sense that it holds information about key column names (.sample, .transcript, .abundance, .normalised_abundance, etc..), and has a validator (that is triggered at every ttBulk function).
I think at the moment, given (i) S3 problem, and (ii) the lack of
formal foundation on SummaisedExperiment interface (that maybe would require an S4 technology itself, where SummariseExperiment could be a slot?) my package would belong more to CRAN, until those two issues will have been resolved.
I imagine there are not many cases where a CRAN package migrated to
Bioconductor after complying with the ecosystem policies.
Thanks a lot. Best wishes. Stefano Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 12:12 Michael Lawrence <mailto:
lawrence.michael at gene.com> ha scritto:
There's a difference between implementing software, where one wants formal data structures, and providing a convenient user interface. Software needs to interface with other software, so a package could provide both types of interfaces, one based on rich (S4) data structures, another on simpler structures with an API more amenable to analysis. I'm not sure the tidyverse is a great answer to the user interface, because it lacks domain semantics. This is still an active area of research (see Stuart Lee's plyranges, for example). I hope you can find a reasonable compromise that enables you to integrate ttBulk into Bioconductor, so that it can take advantage of the synergies the ecosystem provides. PS: There is no simple fix for your example. Michael On Thu, Feb 6, 2020 at 4:12 PM stefano <mailto:
mangiolastefano at gmail.com> wrote:
Thanks a lot for your comment Martin and Michael, Here I reply to Marti's comment. Michael I will try to implement
your
solution! I think a key point from
(that I was under-looking) is *>> "So to sum up: if you submit a package to Bioconductor, there
is an
expectation that your package can work seamlessly with other
Bioconductor
packages, and your implementation should support that. The safest
and
easiest way to do that is to use Bioconductor data structures"* In this case my package would not be suited as I do not use
pre-existing
Bioconductor data structures, but instead i see value in using a
simple
tibble, for the reasons in part explained in the README https://github.com/stemangiola/ttBulk (harvesting the power of
tidyverse
and friends for bulk transcriptomic analyses). *>> "with the minimum standard of being able to accept such objects
even if
you do not rely on them internally (though you should)"* With this I can comply in the sense that I can built converters to
and from
SummarizedExperiment (for example). * >> "If you don't want to do that, then that's a shame, but it
would
suggest that Bioconductor would not be the right place to host this package."* Well said. In summary, I do not rely on Bioconductor data structure, as I am
proposing
another paradigm, but my back end is made of largely Bioconductor
analysis
packages that I would like to interface with tidyverse. So 1) Should I build converters to Bioc. data structures, and force
the use of
S3 object (needed to tiidyverse to work), or 2) Submit to CRAN I don't have strong feeling for either, although I think
Bioconductor would
be a good fit. Please community give me your honest opinions, I
will take
them seriously and proceed. Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan < mailto:mtmorgan.bioc at gmail.com> ha scritto:
The idea isn't to use S4 at any cost, but to 'play well' with the Bioconductor ecosystem, including writing robust and maintainable
code.
This comment
provides some motivation; there was also an interesting exchange
on the
Bioconductor community slack about this (join at https://bioc-community.herokuapp.com/; discussion starting with
The plyranges package http://bioconductor.org/packages/plyranges
and
recently accepted fluentGenomics workflow https://github.com/Bioconductor/Contributions/issues/1350 provide illustrations. In your domain it's really surprising that your package does not
use
(Import or Depend on) SummarizedExperiment or GenomicRanges
packages. From
a superficial look at your package, it seems like something like `reduce_dimensions()` could be defined to take & return a SummarizedExperiment and hence benefit from some of the points in
the
github issue comment mentioned above. Certainly there is a useful transition, both 'on the way in' to a SummarizedExperiment, and after leaving the more specialized
bioinformatic
computations to, e.g., display a pairs plot of the reduced
dimensions,
where one might re-shape the data to a tidy format and use 'plain
old'
tibbles; the fluentGenomics workflow might provide some guidance. At the end of the day it would not be surprising for Bioconductor
packages
to make use of tidy concepts and data structures, particularly in
the
vignette, and it would be a mistake for Bioconductor to exclude well-motivated 'tidy' representations. Martin Morgan ?On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" < mailto:bioc-devel-bounces at r-project.org on behalf of mailto:
mangiolastefano at gmail.com>
wrote:
Hello,
I have a package (ttBulk) under review. I have been told to
replace
the S3
system to S4. My package is based on the class tbl_df and
must be fully
compatible with tidyverse methods (inheritance). After some
tests and
research I understood that tidyverse ecosystem is not
compatible with
S4
classes.
For example, several methos do not apparently handle S4
objects based
on
S3 tbl_df
```library(tidyverse)setOldClass("tbl_df")
setClass("test2", contains = "tbl_df")
my <- new("test2", tibble(a = 1))
my %>% mutate(b = 3)
a b
1 1 3
```
```my <- new("test2", tibble(a = rnorm(100), b = 1))
my %>% nest(data = -b)
Error: `x` must be a vector, not a `test2` object
Run `rlang::last_error()` to see where the error occurred.
```
Could you please advise whether a tidyverse based package can
be
hosted on
Bioconductor, and if S4 classes are really mandatory? I need
to
understand
if I am forced to submit to CRAN instead (although
Bioconductor would
be a
good fit, sice I try to interface transcriptional analysis
tools to
tidy
universe)
Thanks a lot.
Stefano
[[alternative HTML version deleted]]
_______________________________________________
mailto:Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ mailto:Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 mailto:michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
-- Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 mailto:michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners?Compliance? HelpLine at http://www.partners.org/complianceline <http://www.partners.org/complianceline>?. If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail. [[alternative HTML version deleted]]
The first thing is that most contributed packages end up being accepted, so the discussion here should be considered as (strong) advice, rather than requirement. The advice is partly offered to maximize the success of contributed packages in the Bioconductor ecosystem, but at the end of the day the success of your package depends on the value it adds to the users who find it. Vince offered some pretty high enthusiasm, which is a good sign! I used ?primarily? mostly to encourage a more careful implementation of support for SE ? it?s easy to say ?yes, my package interoperates with SE?, but much more challenging to demonstrate through evaluated code that it actually does! Cynically but with empirical experience and not a reflection of your own commitment, I?ve learned that the promise of ?future? integration is seldom realized ? package submission is often the last time that the community can directly influence package implementation and development. It would be interesting to develop review processes that continuously assessed package quality and utility. Martin From: stefano <mangiolastefano at gmail.com> Date: Friday, February 7, 2020 at 6:39 PM To: Vincent Carey <stvjc at channing.harvard.edu> Cc: Martin Morgan <mtmorgan.bioc at gmail.com>, Michael Lawrence <lawrence.michael at gene.com>, "bioc-devel at r-project.org" <bioc-devel at r-project.org> Subject: Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods Thanks Guys for the discussion (I am learning a lot), To Martin: Thanks for the tips. I will start to implement those S4 style methods https://github.com/stemangiola/ttBulk/issues/7 I would really like to be part of Bioconductor community with this package, if just this
" One would expect the vignette and examples to primarily emphasize the use of the interoperable (SummmarizedExperiment) version. "
Could become this
One would expect the vignette and examples to emphasize the use of the interoperable (SummmarizedExperiment) version.
I agree with the integration priority of Bioconductor, but this repository (and this philosophy) is more than its data structures. There should be space for more than one approach to do things, given that the principle are respected. If this is true, I could really spend energies to use methods as you suggested and implement the SummarisedExperiment stream. And with the tips of the community the link will become stronger and stronger with time and versions. To Vincent Thanks a lot for the interest.
One thing I feel is missing is an approach to the following question: [..] How do I make one that works the way ttBulk's operators work?
I'm afraid I don't really understand the question. Are you wondering about extension of the framework? Or creating a similar framework for other applications? Could you please reformulate, maybe giving a concrete example?
Are there patterns there that are preserved across different operators?
A commonality is the use of code for integrating the new calculated information (dplyr), validation functions, ..
Can they be factored out to improve maintainability?
Almost surely yes, this is the first version, I hope to see enough interest, improve the API upon feedback, and hope for (intellectual and practical) contributions from more experts in software engineering.
validObject
Seems a good method, and as far as I tested works for S3 objects as well. I will try to implement it. In fact I already added it as issue into Github https://github.com/stemangiola/ttBulk/issues/6 At the moment I have a custom validation function Best wishes. Stefano Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno sab 8 feb 2020 alle ore 01:54 Vincent Carey <stvjc at channing.harvard.edu<mailto:stvjc at channing.harvard.edu>> ha scritto: This is an interesting discussion and I hope it is ok to continue it a bit. I found the readme for the ttBulk repo extremely enticing and I am sure many people will want to explore this way of working with genomic data. I have only a few moments to explore it and did not read the vignette, but it looks to me as if it is mostly recapitulated in the README, which is an excellent overview. One thing I feel is missing is an approach to the following question: I like the idea of a pipe-oriented operator for programming steps in genomic workflows. How do I make one that works the way ttBulk's operators work? Well, I can have a look at ttBulk:::reduce_dimensions.ttBulk ... It's involved. Are there patterns there that are preserved across different operators? Can they be factored out to improve maintainability? One other point before I run It seems to me the operators "require" that certain fields be defined in their tibble operands.
names(attributes(counts))
[1] "names" "class" "row.names" "parameters"
attributes(counts)$names
[1] "sample" "transcript" "Cell type" [4] "count" "time" "condition" [7] "batch" "factor_of_interest"
validObject(counts)
Error in .classEnv(classDef) :
trying to get slot "package" from an object of a basic class ("NULL") with no slots
Enter a frame number, or 0 to exit
1: validObject(counts)
2: .classEnv(classDef)
I think you mentioned validity checking in a previous email. This
is a feature of S4 that is not frequently invoked. Of course
validObject will not work on counts, but do you have something similar?
(Not all working S4 objects from Bioc will pass validObject tests, but
they should....)
On Fri, Feb 7, 2020 at 5:26 AM Martin Morgan <mtmorgan.bioc at gmail.com<mailto:mtmorgan.bioc at gmail.com>> wrote:
yes, absolutely. A common pattern might be to implement a generic
setGeneric("foo", function(x, ...) standardGeneric("foo"))
an ?internal? function that implements the method on base R data types
.foo <- function(x) {
stopifnot("'x' must be a matrix" = is.matrix(x))
t(x)
}
and methods that act as a facade to the implementation
setMethod("foo", "tbl_df", function(x) {
x <- as.matrix(x)
result <- .foo(x)
as_tibble(result)
})
setMethod("foo", "SummarizedExperiment", function(x) {
result <- .foo(assay(x))
assays(x)[["foo"]] <- result
x
})
One would expect the vignette and examples to primarily emphasize the use of the interoperable (SummmarizedExperiment) version.
Martin Morgan
From: stefano <mangiolastefano at gmail.com<mailto:mangiolastefano at gmail.com>>
Date: Friday, February 7, 2020 at 12:31 AM
To: Michael Lawrence <lawrence.michael at gene.com<mailto:lawrence.michael at gene.com>>
Cc: Martin Morgan <mtmorgan.bioc at gmail.com<mailto:mtmorgan.bioc at gmail.com>>, "bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>" <bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>>
Subject: Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods
Would this scenario satisfy " make the package _directly_ compatible with standard Bioconductor data structures"
If an input is SummarizedExperiment return SummarizedExperiment, if the input is a tbl_df or ttBulk, return ttBulk (?)
Best wishes.
Stefano
Stefano Mangiola | Postdoctoral fellow
Papenfuss Laboratory
The Walter Eliza Hall Institute of Medical Research
+61 (0)466452544
Il giorno ven 7 feb 2020 alle ore 16:15 Michael Lawrence <mailto:lawrence.michael at gene.com<mailto:lawrence.michael at gene.com>> ha scritto:
I would urge you to make the package _directly_ compatible with
standard Bioconductor data structures; no explicit conversion. But you
can create wrapper methods (even on an S3 generic) that perform the
conversion automatically. You'll probably want two separate APIs
though (in different styles), for one thing automatic conversion is
obviously not possible for return values.
Michael
On Thu, Feb 6, 2020 at 5:34 PM stefano <mailto:mangiolastefano at gmail.com<mailto:mangiolastefano at gmail.com>> wrote:
Thanks Michael, yes in a sense, ttBulk and SummariseExperiment can be considere as two interfaces. Would be fair enough to create a function that convert from one to the other, although the default would be ttBulk?
I'm not sure the tidyverse is a great answer to the user interface, because it lacks domain semantics
Would be fair to say that ttBulk class could be considered a tibble with specific semantics? In the sense that it holds information about key column names (.sample, .transcript, .abundance, .normalised_abundance, etc..), and has a validator (that is triggered at every ttBulk function). I think at the moment, given (i) S3 problem, and (ii) the lack of formal foundation on SummaisedExperiment interface (that maybe would require an S4 technology itself, where SummariseExperiment could be a slot?) my package would belong more to CRAN, until those two issues will have been resolved. I imagine there are not many cases where a CRAN package migrated to Bioconductor after complying with the ecosystem policies. Thanks a lot. Best wishes. Stefano Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 12:12 Michael Lawrence <mailto:lawrence.michael at gene.com<mailto:lawrence.michael at gene.com>> ha scritto:
There's a difference between implementing software, where one wants formal data structures, and providing a convenient user interface. Software needs to interface with other software, so a package could provide both types of interfaces, one based on rich (S4) data structures, another on simpler structures with an API more amenable to analysis. I'm not sure the tidyverse is a great answer to the user interface, because it lacks domain semantics. This is still an active area of research (see Stuart Lee's plyranges, for example). I hope you can find a reasonable compromise that enables you to integrate ttBulk into Bioconductor, so that it can take advantage of the synergies the ecosystem provides. PS: There is no simple fix for your example. Michael On Thu, Feb 6, 2020 at 4:12 PM stefano <mailto:mangiolastefano at gmail.com<mailto:mangiolastefano at gmail.com>> wrote:
Thanks a lot for your comment Martin and Michael, Here I reply to Marti's comment. Michael I will try to implement your solution! I think a key point from https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106 (that I was under-looking) is *>> "So to sum up: if you submit a package to Bioconductor, there is an expectation that your package can work seamlessly with other Bioconductor packages, and your implementation should support that. The safest and easiest way to do that is to use Bioconductor data structures"* In this case my package would not be suited as I do not use pre-existing Bioconductor data structures, but instead i see value in using a simple tibble, for the reasons in part explained in the README https://github.com/stemangiola/ttBulk (harvesting the power of tidyverse and friends for bulk transcriptomic analyses). *>> "with the minimum standard of being able to accept such objects even if you do not rely on them internally (though you should)"* With this I can comply in the sense that I can built converters to and from SummarizedExperiment (for example). * >> "If you don't want to do that, then that's a shame, but it would suggest that Bioconductor would not be the right place to host this package."* Well said. In summary, I do not rely on Bioconductor data structure, as I am proposing another paradigm, but my back end is made of largely Bioconductor analysis packages that I would like to interface with tidyverse. So 1) Should I build converters to Bioc. data structures, and force the use of S3 object (needed to tiidyverse to work), or 2) Submit to CRAN I don't have strong feeling for either, although I think Bioconductor would be a good fit. Please community give me your honest opinions, I will take them seriously and proceed. Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan < mailto:mtmorgan.bioc at gmail.com<mailto:mtmorgan.bioc at gmail.com>> ha scritto:
The idea isn't to use S4 at any cost, but to 'play well' with the Bioconductor ecosystem, including writing robust and maintainable code. This comment https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106 provides some motivation; there was also an interesting exchange on the Bioconductor community slack about this (join at https://bioc-community.herokuapp.com/; discussion starting with https://community-bioc.slack.com/archives/C35G93GJH/p1580144746014800). The plyranges package http://bioconductor.org/packages/plyranges and recently accepted fluentGenomics workflow https://github.com/Bioconductor/Contributions/issues/1350 provide illustrations. In your domain it's really surprising that your package does not use (Import or Depend on) SummarizedExperiment or GenomicRanges packages. From a superficial look at your package, it seems like something like `reduce_dimensions()` could be defined to take & return a SummarizedExperiment and hence benefit from some of the points in the github issue comment mentioned above. Certainly there is a useful transition, both 'on the way in' to a SummarizedExperiment, and after leaving the more specialized bioinformatic computations to, e.g., display a pairs plot of the reduced dimensions, where one might re-shape the data to a tidy format and use 'plain old' tibbles; the fluentGenomics workflow might provide some guidance. At the end of the day it would not be surprising for Bioconductor packages to make use of tidy concepts and data structures, particularly in the vignette, and it would be a mistake for Bioconductor to exclude well-motivated 'tidy' representations. Martin Morgan ?On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" < mailto:bioc-devel-bounces at r-project.org<mailto:bioc-devel-bounces at r-project.org> on behalf of mailto:mangiolastefano at gmail.com<mailto:mangiolastefano at gmail.com>> wrote: Hello, I have a package (ttBulk) under review. I have been told to replace the S3 system to S4. My package is based on the class tbl_df and must be fully compatible with tidyverse methods (inheritance). After some tests and research I understood that tidyverse ecosystem is not compatible with S4 classes. For example, several methos do not apparently handle S4 objects based on S3 tbl_df ```library(tidyverse)setOldClass("tbl_df") setClass("test2", contains = "tbl_df") my <- new("test2", tibble(a = 1)) my %>% mutate(b = 3) a b 1 1 3 ``` ```my <- new("test2", tibble(a = rnorm(100), b = 1)) my %>% nest(data = -b) Error: `x` must be a vector, not a `test2` object Run `rlang::last_error()` to see where the error occurred. ``` Could you please advise whether a tidyverse based package can be hosted on Bioconductor, and if S4 classes are really mandatory? I need to understand if I am forced to submit to CRAN instead (although Bioconductor would be a good fit, sice I try to interface transcriptional analysis tools to tidy universe) Thanks a lot. Stefano [[alternative HTML version deleted]]
_______________________________________________
mailto:Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ mailto:Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 mailto:michafla at gene.com<mailto:michafla at gene.com> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
-- Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 mailto:michafla at gene.com<mailto:michafla at gene.com> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube _______________________________________________ Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.
Thanks again. *To Martin.* Got the point and I agree. I will do my best. *To Vincent. * I see. At the moment ttBulk is an API for users, not yet for developers. But I can already imagine an API framework where you can feed a new custom functionality (let's say UMAP dimensionality reduction function), to a wrapper validator and information integrator (to the original input) that ensures endomorphic properties, with the output having the same properties of the input. In order to do this, the definition of a ttBulk tibble and its requirements/validation will have to be a little more established, upon community feedback. The transition then will be pretty easy. When that happens, I would be interested in having some feedback from you! Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno dom 9 feb 2020 alle ore 03:08 Martin Morgan < mtmorgan.bioc at gmail.com> ha scritto:
The first thing is that most contributed packages end up being accepted, so the discussion here should be considered as (strong) advice, rather than requirement. The advice is partly offered to maximize the success of contributed packages in the Bioconductor ecosystem, but at the end of the day the success of your package depends on the value it adds to the users who find it. Vince offered some pretty high enthusiasm, which is a good sign! I used ?primarily? mostly to encourage a more careful implementation of support for SE ? it?s easy to say ?yes, my package interoperates with SE?, but much more challenging to demonstrate through evaluated code that it actually does! Cynically but with empirical experience and not a reflection of your own commitment, I?ve learned that the promise of ?future? integration is seldom realized ? package submission is often the last time that the community can directly influence package implementation and development. It would be interesting to develop review processes that continuously assessed package quality and utility. Martin *From: *stefano <mangiolastefano at gmail.com> *Date: *Friday, February 7, 2020 at 6:39 PM *To: *Vincent Carey <stvjc at channing.harvard.edu> *Cc: *Martin Morgan <mtmorgan.bioc at gmail.com>, Michael Lawrence < lawrence.michael at gene.com>, "bioc-devel at r-project.org" < bioc-devel at r-project.org> *Subject: *Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods Thanks Guys for the discussion (I am learning a lot), *To Martin:* Thanks for the tips. I will start to implement those S4 style methods https://github.com/stemangiola/ttBulk/issues/7 I would *really *like to be part of Bioconductor community with this package, if just this
" One would expect the vignette and examples to primarily emphasize the
use of the interoperable (SummmarizedExperiment) version. " Could become this
One would expect the vignette and examples to emphasize the use of the
interoperable (SummmarizedExperiment) version. I agree with the integration priority of Bioconductor, but this repository (and this philosophy) is more than its data structures. There should be space for more than one approach to do things, given that the principle are respected. If this is true, I could really spend energies to use methods as you suggested and implement the SummarisedExperiment stream. And with the tips of the community the link will become stronger and stronger with time and versions. *To Vincent* Thanks a lot for the interest. *> One thing I feel is missing is an approach to the following question: [..] How do I make one that works the way ttBulk's operators work?* I'm afraid I don't really understand the question. Are you wondering about extension of the framework? Or creating a similar framework for other applications? Could you please reformulate, maybe giving a concrete example? *> Are there patterns there that are preserved across different operators? * A commonality is the use of code for integrating the new calculated information (dplyr), validation functions, .. *> Can they be factored out to improve maintainability?* Almost surely yes, this is the first version, I hope to see enough interest, improve the API upon feedback, and hope for (intellectual and practical) contributions from more experts in software engineering. *> validObject * Seems a good method, and as far as I tested works for S3 objects as well. I will try to implement it. In fact I already added it as issue into Github https://github.com/stemangiola/ttBulk/issues/6 At the moment I have a custom validation function Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno sab 8 feb 2020 alle ore 01:54 Vincent Carey < stvjc at channing.harvard.edu> ha scritto: This is an interesting discussion and I hope it is ok to continue it a bit. I found the readme for the ttBulk repo extremely enticing and I am sure many people will want to explore this way of working with genomic data. I have only a few moments to explore it and did not read the vignette, but it looks to me as if it is mostly recapitulated in the README, which is an excellent overview. One thing I feel is missing is an approach to the following question: I like the idea of a pipe-oriented operator for programming steps in genomic workflows. How do I make one that works the way ttBulk's operators work? Well, I can have a look at ttBulk:::reduce_dimensions.ttBulk ... It's involved. Are there patterns there that are preserved across different operators? Can they be factored out to improve maintainability? One other point before I run It seems to me the operators "require" that certain fields be defined in their tibble operands.
names(attributes(counts))
[1] "names" "class" "row.names" "parameters"
attributes(counts)$names
[1] "sample" "transcript" "Cell type" [4] "count" "time" "condition" [7] "batch" "factor_of_interest"
validObject(counts)
*Error in .classEnv(classDef) : *
* trying to get slot "package" from an object of a basic class ("NULL")
with no slots*
Enter a frame number, or 0 to exit
1: validObject(counts)
2: .classEnv(classDef)
I think you mentioned validity checking in a previous email. This
is a feature of S4 that is not frequently invoked. Of course
validObject will not work on counts, but do you have something similar?
(Not all working S4 objects from Bioc will pass validObject tests, but
they should....)
On Fri, Feb 7, 2020 at 5:26 AM Martin Morgan <mtmorgan.bioc at gmail.com>
wrote:
yes, absolutely. A common pattern might be to implement a generic
setGeneric("foo", function(x, ...) standardGeneric("foo"))
an ?internal? function that implements the method on base R data types
.foo <- function(x) {
stopifnot("'x' must be a matrix" = is.matrix(x))
t(x)
}
and methods that act as a facade to the implementation
setMethod("foo", "tbl_df", function(x) {
x <- as.matrix(x)
result <- .foo(x)
as_tibble(result)
})
setMethod("foo", "SummarizedExperiment", function(x) {
result <- .foo(assay(x))
assays(x)[["foo"]] <- result
x
})
One would expect the vignette and examples to primarily emphasize the use
of the interoperable (SummmarizedExperiment) version.
Martin Morgan
From: stefano <mangiolastefano at gmail.com>
Date: Friday, February 7, 2020 at 12:31 AM
To: Michael Lawrence <lawrence.michael at gene.com>
Cc: Martin Morgan <mtmorgan.bioc at gmail.com>, "bioc-devel at r-project.org" <
bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] Compatibility of Bioconductor with tidyverse S3
classes/methods
Would this scenario satisfy " make the package _directly_ compatible with
standard Bioconductor data structures"
If an input is SummarizedExperiment return SummarizedExperiment, if the
input is a tbl_df or ttBulk, return ttBulk (?)
Best wishes.
Stefano
Stefano Mangiola | Postdoctoral fellow
Papenfuss Laboratory
The Walter Eliza Hall Institute of Medical Research
+61 (0)466452544
Il giorno ven 7 feb 2020 alle ore 16:15 Michael Lawrence <mailto:
lawrence.michael at gene.com> ha scritto:
I would urge you to make the package _directly_ compatible with
standard Bioconductor data structures; no explicit conversion. But you
can create wrapper methods (even on an S3 generic) that perform the
conversion automatically. You'll probably want two separate APIs
though (in different styles), for one thing automatic conversion is
obviously not possible for return values.
Michael
On Thu, Feb 6, 2020 at 5:34 PM stefano <mailto:mangiolastefano at gmail.com>
wrote:
Thanks Michael, yes in a sense, ttBulk and SummariseExperiment can be considere as two
interfaces. Would be fair enough to create a function that convert from one to the other, although the default would be ttBulk?
I'm not sure the tidyverse is a great answer to the user interface,
because it lacks domain semantics
Would be fair to say that ttBulk class could be considered a tibble with
specific semantics? In the sense that it holds information about key column names (.sample, .transcript, .abundance, .normalised_abundance, etc..), and has a validator (that is triggered at every ttBulk function).
I think at the moment, given (i) S3 problem, and (ii) the lack of formal
foundation on SummaisedExperiment interface (that maybe would require an S4 technology itself, where SummariseExperiment could be a slot?) my package would belong more to CRAN, until those two issues will have been resolved.
I imagine there are not many cases where a CRAN package migrated to
Bioconductor after complying with the ecosystem policies.
Thanks a lot. Best wishes. Stefano Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 12:12 Michael Lawrence <mailto:
lawrence.michael at gene.com> ha scritto:
There's a difference between implementing software, where one wants formal data structures, and providing a convenient user interface. Software needs to interface with other software, so a package could provide both types of interfaces, one based on rich (S4) data structures, another on simpler structures with an API more amenable to analysis. I'm not sure the tidyverse is a great answer to the user interface, because it lacks domain semantics. This is still an active area of research (see Stuart Lee's plyranges, for example). I hope you can find a reasonable compromise that enables you to integrate ttBulk into Bioconductor, so that it can take advantage of the synergies the ecosystem provides. PS: There is no simple fix for your example. Michael On Thu, Feb 6, 2020 at 4:12 PM stefano <mailto:
mangiolastefano at gmail.com> wrote:
Thanks a lot for your comment Martin and Michael, Here I reply to Marti's comment. Michael I will try to implement your solution! I think a key point from
(that I was under-looking) is *>> "So to sum up: if you submit a package to Bioconductor, there is
an
expectation that your package can work seamlessly with other
Bioconductor
packages, and your implementation should support that. The safest and easiest way to do that is to use Bioconductor data structures"* In this case my package would not be suited as I do not use
pre-existing
Bioconductor data structures, but instead i see value in using a
simple
tibble, for the reasons in part explained in the README https://github.com/stemangiola/ttBulk (harvesting the power of
tidyverse
and friends for bulk transcriptomic analyses). *>> "with the minimum standard of being able to accept such objects
even if
you do not rely on them internally (though you should)"* With this I can comply in the sense that I can built converters to
and from
SummarizedExperiment (for example). * >> "If you don't want to do that, then that's a shame, but it would suggest that Bioconductor would not be the right place to host this package."* Well said. In summary, I do not rely on Bioconductor data structure, as I am
proposing
another paradigm, but my back end is made of largely Bioconductor
analysis
packages that I would like to interface with tidyverse. So 1) Should I build converters to Bioc. data structures, and force the
use of
S3 object (needed to tiidyverse to work), or 2) Submit to CRAN I don't have strong feeling for either, although I think Bioconductor
would
be a good fit. Please community give me your honest opinions, I will
take
them seriously and proceed. Best wishes. *Stefano * Stefano Mangiola | Postdoctoral fellow Papenfuss Laboratory The Walter Eliza Hall Institute of Medical Research +61 (0)466452544 Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan < mailto:mtmorgan.bioc at gmail.com> ha scritto:
The idea isn't to use S4 at any cost, but to 'play well' with the Bioconductor ecosystem, including writing robust and maintainable
code.
This comment
provides some motivation; there was also an interesting exchange on
the
Bioconductor community slack about this (join at https://bioc-community.herokuapp.com/; discussion starting with
The plyranges package http://bioconductor.org/packages/plyranges
and
recently accepted fluentGenomics workflow https://github.com/Bioconductor/Contributions/issues/1350 provide illustrations. In your domain it's really surprising that your package does not use (Import or Depend on) SummarizedExperiment or GenomicRanges
packages. From
a superficial look at your package, it seems like something like `reduce_dimensions()` could be defined to take & return a SummarizedExperiment and hence benefit from some of the points in
the
github issue comment mentioned above. Certainly there is a useful transition, both 'on the way in' to a SummarizedExperiment, and after leaving the more specialized
bioinformatic
computations to, e.g., display a pairs plot of the reduced
dimensions,
where one might re-shape the data to a tidy format and use 'plain
old'
tibbles; the fluentGenomics workflow might provide some guidance. At the end of the day it would not be surprising for Bioconductor
packages
to make use of tidy concepts and data structures, particularly in
the
vignette, and it would be a mistake for Bioconductor to exclude well-motivated 'tidy' representations. Martin Morgan ?On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" < mailto:bioc-devel-bounces at r-project.org on behalf of mailto:
mangiolastefano at gmail.com>
wrote:
Hello,
I have a package (ttBulk) under review. I have been told to
replace
the S3
system to S4. My package is based on the class tbl_df and must
be fully
compatible with tidyverse methods (inheritance). After some
tests and
research I understood that tidyverse ecosystem is not
compatible with
S4
classes.
For example, several methos do not apparently handle S4
objects based
on
S3 tbl_df
```library(tidyverse)setOldClass("tbl_df")
setClass("test2", contains = "tbl_df")
my <- new("test2", tibble(a = 1))
my %>% mutate(b = 3)
a b
1 1 3
```
```my <- new("test2", tibble(a = rnorm(100), b = 1))
my %>% nest(data = -b)
Error: `x` must be a vector, not a `test2` object
Run `rlang::last_error()` to see where the error occurred.
```
Could you please advise whether a tidyverse based package can be
hosted on
Bioconductor, and if S4 classes are really mandatory? I need to
understand
if I am forced to submit to CRAN instead (although Bioconductor
would
be a
good fit, sice I try to interface transcriptional analysis
tools to
tidy
universe)
Thanks a lot.
Stefano
[[alternative HTML version deleted]]
_______________________________________________
mailto:Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ mailto:Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 mailto:michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
-- Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 mailto:michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel The information in this e-mail is intended only for th...{{dropped:17}}