Skip to content

[Bioc-devel] bioc pkgs depending on packages that are only in github?

8 messages · Vincent Carey, Tim Triche, Jr., Michael Lawrence +3 more

#
our guidelines state

Packages you depend on must be available via Bioconductor or CRAN; users
and the automated build system have no way to install packages from other
sources.

with increased utility of devtools/install_github perhaps we can relax this?

is it a can of worms we don't want to open?
#
Re: can of worms: yes it is

Re: don't want to open: well, it's either that or I personally cram some other peoples' packages through BioC approval so that my DMRcate and fixSeq mega patches can stick

So, it's a can of worms alright, and maybe the solution is to get more people to submit to their benevolent BioC overlords.  Because BioC is what CRAN and Python and various other competitors / rivals / alternatives could have been, if they'd been disciplined about it from the start.  BioC (and maybe glmnet/rsig) is the greatest achievement of R.  No sense letting that slip just because it's inconvenient.  Bring up the level of the github/rforge/googlecode/etc projects instead. 

I started this email agreeing with you but as I thought through it, I changed my mind. The great weakness of python (been using THAT lately) is that package documentation sucks. (Also it's crappy for manipulating BAMs). The BioC standards are IMHO the ultimate counterpoint to this, as is BiocParallel, the AMI, the google genomics R client... Why let something awesome like the BioC codebase slide downhill?  Make the other guys raise their standards instead. Over the long run, everybody wins (more citations, more users, higher quality code base, better reproducibility for science & industry)

Just mho.  My daughter woke up so I'm out of time to edit this monstrosity :-/

--t
#
On Sat, Nov 8, 2014 at 8:01 AM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:
Gabe Becker is finishing up a framework that generalizes the notion of
package repositories such that packages can be distributed over multiple
sources, including traditional repositories and SCM systems like Github. If
Bioconductor were to maintain a manifest, then our generalized installation
machinery would be able to install everything in a dependency-aware manner
(install_github can only resolve dependencies located in repositories).
BiocInstaller could wrap it. The manifest system is a prototype for
something that could end up in R itself.

  
  
#
Really, people who distribute their packages solely through github are
making it convenient for the developers and doing a lot of potential harm
to users.  When you use install_github, there is no real concept of
versioning, of whether the package succeed building or passes checks on
various platforms (which is pretty important for example for anything with
C(++) code).  What we have in Bioconductor is so much better for the end
user, and so much better for reproducible research.  On top of this, as Tim
says, we have some additional QC checks.

It does seem that CRAN these days are very hard to deal with, and I am
happy that I don't have packages in that repository.  The Bioc way
(interfacing the repository with source code version control), which allows
much more rapid pushing of fixes to users (assuming they use devel), seems
uniformly better in my opinion.  I can understand why package authors may
be fed up with CRAN, but by just putting packages on github they also
signal (in my opinion) that they are not willing to go the last mile and
make their code release quality.  As everyone knows, actually writing a
vignette, making sure the code passes check on all platforms, having man
pages etc etc. can be some amount of tedious work, but it really does make
the end experience uniformly better.

There is clearly a trend towards just putting things up on github and not
bothering with submitting to a repository.  That is - in my opinion - a
trend towards inferior quality.  And importantly, as I see it, it does not
support reproducible research.

Best,
Kasper


On Sat, Nov 8, 2014 at 11:53 AM, Michael Lawrence <lawrence.michael at gene.com

  
  
#
On 11/08/2014 08:01 AM, Vincent Carey wrote:
Presence on git hub today doesn't imply any commitment to ongoing availability, 
and does not provide even nominal assurance that the package builds and installs 
across the major platforms. It also doesn't have formal requirements for passing 
R CMD check or meeting the higher documentation standards of Biocondcutor, and 
there are no guarantees about basic programming best practices (e.g., consistent 
version numbering across releases).  (Of course many individual github resources 
are well maintained and documented, and are cross-platform compatible.) So for 
these reasons it seems like the bar for dependencies should remain at least 
approximately where it is -- CRAN or Bioc packages.

Martin

  
    
#
On 11/08/2014 08:53 AM, Michael Lawrence wrote:
Probably you mean a manifest in a different sense, but in case not I'll mention

   https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/bioc_3.1.manifest

and friends.
Managing dependencies seems like an important and necessary advance, but I don't 
think sufficient for Bioc purposes? E.g., both CRAN and Bioconductor at some 
level take control of package sources, so the source is available even after the 
developer has (usually casually) lost interest in the useful resource they are 
providing. Likewise the 'quality assurance' provided by build and check (on 
change for CRAN, nightly for Bioc) across platforms and against current 
versions, and the manual maintenance activities of both the CRAN and Bioc teams 
(e.g., identifying the root cause of problems exhibited by package A as a change 
or deficiency in package B).

Certainly it will be interesting to see Gabe's mature product.

Martin

  
    
1 day later
#
On Sat, Nov 8, 2014 at 12:22 PM, Martin Morgan <mtmorgan at fredhutch.org>
wrote:
Gabe's manifest is a list of packages, but it also points to package
locations and, optionally, versions.
There is a tension between the desire for validation and the pace of
science. Our goal is to enable the user to choose his or her comfort zone.
Gabe's switchr/GRAN framework makes it relatively easy to deploy a manifest
as a traditional, validated repository. It will even pull from github or
other SCM with each build (I think it just checks for a version bump, but
that might be configurable). Of course, this means the user has the skills
and resources necessary to deploy such a repository. The Bioconductor
project certainly would though, so some sort of validated approach would
definitely be preferable.

  
  
#
Hey all,

A package manifest is essentially a decentralized PACKAGES file (this is
what defines what is in a package repository, for those who don't know). As
Michael pointed out, manifests can point to remote or local files (e.g.
those in an actual repository or the CRAN Archive), but it also understands
SCM systems (currently Git and SVN) and can point directly to those sources.

Note that the only thing necessary for a manifest to define a validated,
guarantee-providing cohort of packages is for it to point to a set of
packages which define such a cohort. If, after completion of Bioc's build
and testing process, a package manifest were generated and published to the
Web, packages installed using that manifest would provide all the same
guarantees that those from the Bioc repositories do now.

Finally, Martin, I didn't know about the Bioc manifests. Thanks for the
heads up on that.

~G


On Sun, Nov 9, 2014 at 4:35 PM, Michael Lawrence <lawrence.michael at gene.com>
wrote: