our guidelines state Packages you depend on must be available via Bioconductor or CRAN; users and the automated build system have no way to install packages from other sources. with increased utility of devtools/install_github perhaps we can relax this? is it a can of worms we don't want to open?
[Bioc-devel] bioc pkgs depending on packages that are only in github?
8 messages · Vincent Carey, Tim Triche, Jr., Michael Lawrence +3 more
Re: can of worms: yes it is Re: don't want to open: well, it's either that or I personally cram some other peoples' packages through BioC approval so that my DMRcate and fixSeq mega patches can stick So, it's a can of worms alright, and maybe the solution is to get more people to submit to their benevolent BioC overlords. Because BioC is what CRAN and Python and various other competitors / rivals / alternatives could have been, if they'd been disciplined about it from the start. BioC (and maybe glmnet/rsig) is the greatest achievement of R. No sense letting that slip just because it's inconvenient. Bring up the level of the github/rforge/googlecode/etc projects instead. I started this email agreeing with you but as I thought through it, I changed my mind. The great weakness of python (been using THAT lately) is that package documentation sucks. (Also it's crappy for manipulating BAMs). The BioC standards are IMHO the ultimate counterpoint to this, as is BiocParallel, the AMI, the google genomics R client... Why let something awesome like the BioC codebase slide downhill? Make the other guys raise their standards instead. Over the long run, everybody wins (more citations, more users, higher quality code base, better reproducibility for science & industry) Just mho. My daughter woke up so I'm out of time to edit this monstrosity :-/ --t
On Nov 8, 2014, at 8:01 AM, Vincent Carey <stvjc at channing.harvard.edu> wrote: our guidelines state Packages you depend on must be available via Bioconductor or CRAN; users and the automated build system have no way to install packages from other sources. with increased utility of devtools/install_github perhaps we can relax this? is it a can of worms we don't want to open? [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On Sat, Nov 8, 2014 at 8:01 AM, Vincent Carey <stvjc at channing.harvard.edu> wrote:
our guidelines state Packages you depend on must be available via Bioconductor or CRAN; users and the automated build system have no way to install packages from other sources. with increased utility of devtools/install_github perhaps we can relax this? is it a can of worms we don't want to open?
Gabe Becker is finishing up a framework that generalizes the notion of package repositories such that packages can be distributed over multiple sources, including traditional repositories and SCM systems like Github. If Bioconductor were to maintain a manifest, then our generalized installation machinery would be able to install everything in a dependency-aware manner (install_github can only resolve dependencies located in repositories). BiocInstaller could wrap it. The manifest system is a prototype for something that could end up in R itself.
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Really, people who distribute their packages solely through github are making it convenient for the developers and doing a lot of potential harm to users. When you use install_github, there is no real concept of versioning, of whether the package succeed building or passes checks on various platforms (which is pretty important for example for anything with C(++) code). What we have in Bioconductor is so much better for the end user, and so much better for reproducible research. On top of this, as Tim says, we have some additional QC checks. It does seem that CRAN these days are very hard to deal with, and I am happy that I don't have packages in that repository. The Bioc way (interfacing the repository with source code version control), which allows much more rapid pushing of fixes to users (assuming they use devel), seems uniformly better in my opinion. I can understand why package authors may be fed up with CRAN, but by just putting packages on github they also signal (in my opinion) that they are not willing to go the last mile and make their code release quality. As everyone knows, actually writing a vignette, making sure the code passes check on all platforms, having man pages etc etc. can be some amount of tedious work, but it really does make the end experience uniformly better. There is clearly a trend towards just putting things up on github and not bothering with submitting to a repository. That is - in my opinion - a trend towards inferior quality. And importantly, as I see it, it does not support reproducible research. Best, Kasper On Sat, Nov 8, 2014 at 11:53 AM, Michael Lawrence <lawrence.michael at gene.com
wrote:
On Sat, Nov 8, 2014 at 8:01 AM, Vincent Carey <stvjc at channing.harvard.edu> wrote:
our guidelines state Packages you depend on must be available via Bioconductor or CRAN; users and the automated build system have no way to install packages from other sources. with increased utility of devtools/install_github perhaps we can relax this? is it a can of worms we don't want to open?
Gabe Becker is finishing up a framework that generalizes the notion of package repositories such that packages can be distributed over multiple sources, including traditional repositories and SCM systems like Github. If Bioconductor were to maintain a manifest, then our generalized installation machinery would be able to install everything in a dependency-aware manner (install_github can only resolve dependencies located in repositories). BiocInstaller could wrap it. The manifest system is a prototype for something that could end up in R itself.
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On 11/08/2014 08:01 AM, Vincent Carey wrote:
our guidelines state Packages you depend on must be available via Bioconductor or CRAN; users and the automated build system have no way to install packages from other sources. with increased utility of devtools/install_github perhaps we can relax this? is it a can of worms we don't want to open?
Presence on git hub today doesn't imply any commitment to ongoing availability, and does not provide even nominal assurance that the package builds and installs across the major platforms. It also doesn't have formal requirements for passing R CMD check or meeting the higher documentation standards of Biocondcutor, and there are no guarantees about basic programming best practices (e.g., consistent version numbering across releases). (Of course many individual github resources are well maintained and documented, and are cross-platform compatible.) So for these reasons it seems like the bar for dependencies should remain at least approximately where it is -- CRAN or Bioc packages. Martin
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
On 11/08/2014 08:53 AM, Michael Lawrence wrote:
On Sat, Nov 8, 2014 at 8:01 AM, Vincent Carey <stvjc at channing.harvard.edu> wrote:
our guidelines state Packages you depend on must be available via Bioconductor or CRAN; users and the automated build system have no way to install packages from other sources. with increased utility of devtools/install_github perhaps we can relax this? is it a can of worms we don't want to open?
Gabe Becker is finishing up a framework that generalizes the notion of package repositories such that packages can be distributed over multiple sources, including traditional repositories and SCM systems like Github. If Bioconductor were to maintain a manifest, then our generalized installation
Probably you mean a manifest in a different sense, but in case not I'll mention https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/bioc_3.1.manifest and friends.
machinery would be able to install everything in a dependency-aware manner (install_github can only resolve dependencies located in repositories).
Managing dependencies seems like an important and necessary advance, but I don't think sufficient for Bioc purposes? E.g., both CRAN and Bioconductor at some level take control of package sources, so the source is available even after the developer has (usually casually) lost interest in the useful resource they are providing. Likewise the 'quality assurance' provided by build and check (on change for CRAN, nightly for Bioc) across platforms and against current versions, and the manual maintenance activities of both the CRAN and Bioc teams (e.g., identifying the root cause of problems exhibited by package A as a change or deficiency in package B). Certainly it will be interesting to see Gabe's mature product. Martin
BiocInstaller could wrap it. The manifest system is a prototype for something that could end up in R itself.
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
1 day later
On Sat, Nov 8, 2014 at 12:22 PM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
On 11/08/2014 08:53 AM, Michael Lawrence wrote:
On Sat, Nov 8, 2014 at 8:01 AM, Vincent Carey <stvjc at channing.harvard.edu
wrote: our guidelines state
Packages you depend on must be available via Bioconductor or CRAN; users and the automated build system have no way to install packages from other sources. with increased utility of devtools/install_github perhaps we can relax this? is it a can of worms we don't want to open? Gabe Becker is finishing up a framework that generalizes the notion of
package repositories such that packages can be distributed over multiple sources, including traditional repositories and SCM systems like Github. If Bioconductor were to maintain a manifest, then our generalized installation
Probably you mean a manifest in a different sense, but in case not I'll mention https://hedgehog.fhcrc.org/bioconductor/trunk/madman/ Rpacks/bioc_3.1.manifest and friends.
Gabe's manifest is a list of packages, but it also points to package locations and, optionally, versions.
machinery would be able to install everything in a dependency-aware manner
(install_github can only resolve dependencies located in repositories).
Managing dependencies seems like an important and necessary advance, but I don't think sufficient for Bioc purposes? E.g., both CRAN and Bioconductor at some level take control of package sources, so the source is available even after the developer has (usually casually) lost interest in the useful resource they are providing. Likewise the 'quality assurance' provided by build and check (on change for CRAN, nightly for Bioc) across platforms and against current versions, and the manual maintenance activities of both the CRAN and Bioc teams (e.g., identifying the root cause of problems exhibited by package A as a change or deficiency in package B).
There is a tension between the desire for validation and the pace of science. Our goal is to enable the user to choose his or her comfort zone. Gabe's switchr/GRAN framework makes it relatively easy to deploy a manifest as a traditional, validated repository. It will even pull from github or other SCM with each build (I think it just checks for a version bump, but that might be configurable). Of course, this means the user has the skills and resources necessary to deploy such a repository. The Bioconductor project certainly would though, so some sort of validated approach would definitely be preferable.
Certainly it will be interesting to see Gabe's mature product. Martin BiocInstaller could wrap it. The manifest system is a prototype for
something that could end up in R itself.
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
Hey all, A package manifest is essentially a decentralized PACKAGES file (this is what defines what is in a package repository, for those who don't know). As Michael pointed out, manifests can point to remote or local files (e.g. those in an actual repository or the CRAN Archive), but it also understands SCM systems (currently Git and SVN) and can point directly to those sources. Note that the only thing necessary for a manifest to define a validated, guarantee-providing cohort of packages is for it to point to a set of packages which define such a cohort. If, after completion of Bioc's build and testing process, a package manifest were generated and published to the Web, packages installed using that manifest would provide all the same guarantees that those from the Bioc repositories do now. Finally, Martin, I didn't know about the Bioc manifests. Thanks for the heads up on that. ~G On Sun, Nov 9, 2014 at 4:35 PM, Michael Lawrence <lawrence.michael at gene.com> wrote:
On Sat, Nov 8, 2014 at 12:22 PM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
On 11/08/2014 08:53 AM, Michael Lawrence wrote:
On Sat, Nov 8, 2014 at 8:01 AM, Vincent Carey <
stvjc at channing.harvard.edu
wrote: our guidelines state
Packages you depend on must be available via Bioconductor or CRAN;
users
and the automated build system have no way to install packages from
other
sources. with increased utility of devtools/install_github perhaps we can relax this? is it a can of worms we don't want to open? Gabe Becker is finishing up a framework that generalizes the notion of
package repositories such that packages can be distributed over multiple sources, including traditional repositories and SCM systems like Github. If Bioconductor were to maintain a manifest, then our generalized installation
Probably you mean a manifest in a different sense, but in case not I'll mention https://hedgehog.fhcrc.org/bioconductor/trunk/madman/ Rpacks/bioc_3.1.manifest and friends.
Gabe's manifest is a list of packages, but it also points to package locations and, optionally, versions.
machinery would be able to install everything in a dependency-aware
manner
(install_github can only resolve dependencies located in repositories).
Managing dependencies seems like an important and necessary advance, but
I
don't think sufficient for Bioc purposes? E.g., both CRAN and
Bioconductor
at some level take control of package sources, so the source is available even after the developer has (usually casually) lost interest in the
useful
resource they are providing. Likewise the 'quality assurance' provided by build and check (on change for CRAN, nightly for Bioc) across platforms
and
against current versions, and the manual maintenance activities of both
the
CRAN and Bioc teams (e.g., identifying the root cause of problems
exhibited
by package A as a change or deficiency in package B).
There is a tension between the desire for validation and the pace of science. Our goal is to enable the user to choose his or her comfort zone. Gabe's switchr/GRAN framework makes it relatively easy to deploy a manifest as a traditional, validated repository. It will even pull from github or other SCM with each build (I think it just checks for a version bump, but that might be configurable). Of course, this means the user has the skills and resources necessary to deploy such a repository. The Bioconductor project certainly would though, so some sort of validated approach would definitely be preferable.
Certainly it will be interesting to see Gabe's mature product. Martin BiocInstaller could wrap it. The manifest system is a prototype for
something that could end up in R itself.
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Computational Biologist Genentech Research [[alternative HTML version deleted]]