Skip to content

[Bioc-devel] BiocInstaller: next generation

16 messages · Ryan, Aaron Lun, Martin Morgan +6 more

#
Developers --

A preliminary heads-up and request for comments.

Almost since project inception, we've used the commands

   source("https://bioconductor.org/biocLite.R")
   biocLite(pkgs)

to install packages. This poses security risks (e.g., typos in the url) 
and deviates from standard R package installation procedures.


We'd like to move to a different system where a base package, call it 
'BiocManager', is installed from CRAN and used to install Bioconductor 
packages

   if (!"BiocManager" %in% rownames(installed.packages()))
       install.packages("BiocManager")
   BiocManager::install(pkgs)

This establishes a secure chain from user R session to Bioconductor 
package installation. It is also more consistent with base R package 
installation procedures.

BiocManager exposes four functions

   - install() or update packages

   - version() version of Bioconductor in use

   - valid() are all Bioconductor packages from the same Bioconductor 
version?

   - repositories() url location for Bioconductor version-specific 
repositories

install() behaves like biocLite(), using the most current version of 
Bioconductor for the version of R in use. It stores this state using a 
Bioconductor package 'BiocVersion', which is nothing more than a 
sentinel for the version in use. One can also 'use devel' or a 
particular version of Bioconductor (consistent with the version of R) with

   BiocManager::install(version = "3.8")   # or the synonym "devel"


We intend to phase this in over several release cycles, and to continue 
to support the traditional biocLite() route for versions before 
BiocManager becomes available.

We also intend to change the overall versioning of 'Bioconductor' 
itself, where releases are always even (3.8, 3.10, 3.12, ...) and 
'devel' always odd.

Obviously this is a large change, eventually requiring updates to many 
locations on our web site and individual vignettes.


Of course the key question is the name of the 'BiocManager' package. It 
cannot easily be 'BiocInstaller', because of the differences in way CRAN 
and Bioconductor version packages. Some possible names are
'
BiocInstall::install()
BiocPackages::install()
BiocManager
BiocMaestro


Your comments are welcome...

Martin


This email message may contain legally privileged and/or...{{dropped:2}}
#
Hi Martin,

Is the intent that the BiocManager package should never be loaded via
library, but functions in the package should always be called as
BiocManager::FUN()? If not, I would consider prefixing the functions with
"bioc".

Also, I assume that once this BiocManager package is on CRAN, the
biocLite.R script will become a thin wrapper around it?

-Ryan

On Wed, May 9, 2018 at 3:29 PM Martin Morgan <martin.morgan at roswellpark.org>
wrote:

  
  
#
This all sounds pretty reasonable to me. The ability to choose the
version in install() is nice, especially if we can easily flip between
versions in different install locations. I presume that
version="release" will be the default?

As for the names - BiocManager seems the most sober of the lot. And
thematically appropriate - you might have an orchestra and conductor,
but you still need a manager to get everyone paid, fed and on the stage.

-Aaron
Martin Morgan wrote:
_______________________________________________

The information in this email is confidential and intended solely for the addressee.
You must not disclose, forward, print or use it without the permission of the sender.

The Walter and Eliza Hall Institute acknowledges the Wurundjeri people of the Kulin
Nation as the traditional owners of the land where our campuses are located and
the continuing connection to country and community.
_______________________________________________
#
On 05/09/2018 06:39 PM, Ryan Thompson wrote:
I would rather that all documentation use BiocManager::install(), which 
is the only failsafe way to do the disambiguation.
biocLite.R would be a legacy script available for current and past 
versions of Bioconductor. Going forward, it would go through a cycle of 
deprecated and defunct.

Martin
This email message may contain legally privileged and/or...{{dropped:2}}
#
On 05/09/2018 07:05 PM, Aaron Lun wrote:
I actually have some reservation about introducing the 'release' synonym 
-- telling a user on April 30 that they should use 'release' and then 
again on May 2 that they should use 'release', but to mean two different 
versions of Bioconductor seems confusing to me. Also the notion that a 
more-or-less casual user will get into Bioconducor enough to grok the 
whole release / devel cycle seems somehow presumptuous. Of course 
developers are a more sophisticated lot, and the notion of a 'devel' 
branch is central to Bioconductor's approach to version management...

Martin
#
On Thu, May 10, 2018, 00:29 Martin Morgan <martin.morgan at roswellpark.org>
wrote:
May I suggest the package name:

* Bioconductor

The potential downside would be possible confusions between the version of
this package versus the actual Bioconductor repository.  Could the
Bioconductor *package* have a version  x.y.z that reflects the *repository*
x.y version?

/Henrik

  
  
#
Hi Henrik,

On Thu, May 10, 2018 at 1:21 AM, Henrik Bengtsson <
henrik.bengtsson at gmail.com> wrote:
*repository*
This is a nice suggestion that also crossed my mind, but users new to both
R and Bioconductor might think "but I have 'Bioconductor' installed, why
can't I run this script?", and it might complicate web namespace / presence
by entrapping searches for the Bioconductor system to the single package.
Pariksheet
#
On 05/10/2018 01:37 AM, Pariksheet Nanda wrote:
Yes we thought of this name and rejected it for the reasons Pariksheet 
mentions -- the opportunity for very significant confusion between the 
package and the project.

Actually we use a light-weight BiocVersion package in the way suggested 
at the end of Henrik's comment -- the version of BiocVersion corresponds 
to the version of Bioconductor the repository. It is also used to 
'remember' what version of Bioconductor the user has installed.

Martin
This email message may contain legally privileged and/or...{{dropped:2}}
#
Dear Bioconductor team,

Bioconductor packages can be installed via install.packages when they are a
dependency of another package if there is in the DESCRIPTION file a
"biocViews:" section (see https://github.com/r-lib/devtools/issues/700#
issuecomment-235127291) .
I don't know how install.packages handles these packages in Bioconductor
but would it be possible to use this trick to directly install the packages
in Bioconductor?

What will happen with BiocInstaller package? The package has the same
description and purpose of the proposed new package.
Wouldn't be better to move this new functionality to BiocInstaller and move
it to CRAN?

Best,

Llu?s



On 10 May 2018 at 00:11, Martin Morgan <martin.morgan at roswellpark.org>
wrote:

  
  
#
Good day,

The features of the proposed package seem a lot like BiocInstaller. Once I have upgraded R and have the newest BiocInstaller installed using the bootstrapping technique of source("https://bioconductor.org/biocLite.R"), I typically do

library(BiocInstaller)
biocLite("GenomicAlignments")

to install the GenomicAlignments package in a subsequent R session, for instance. This avoids repetitive sourcing of the biocLite script from the Bioconductor server.

--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
#
On 05/11/2018 03:33 AM, Llu?s Revilla wrote:
biocLite() actually user install.packages(). the 'trick' is to have the 
correct 'repos' argument. In essence that is what biocLite() / 
BiocManager::install() does -- sets the repository argument to include 
the location where Bioconductor version x is available for R version y.

Users of base R can `setRepositories()`; then install.packages() 
installs Bioconductor packages. The problem is that the version of 
Bioconductor installed by this approach is hard-coded in R, so the user 
gets the version of Bioc available at the time of the release of R, 
rather than the most recent version available for their version of R. 
After setRepositories() and selecting all bioc options, in my 'devel' 
install I have

 > getOption("repos")
                                                    CRAN
                                                "@CRAN@"
                                                BioCsoft
            "https://bioconductor.org/packages/3.7/bioc"
                                                 BioCann
"https://bioconductor.org/packages/3.7/data/annotation"
                                                 BioCexp
"https://bioconductor.org/packages/3.7/data/experiment"

whereas I actually want

 > BiocManager::repositories()
                                                BioCsoft
            "https://bioconductor.org/packages/3.8/bioc"
                                                 BioCann
"https://bioconductor.org/packages/3.8/data/annotation"
                                                 BioCexp
"https://bioconductor.org/packages/3.8/data/experiment"
                                           BioCworkflows
       "https://bioconductor.org/packages/3.8/workflows"
                                                    CRAN
                              "https://cran.rstudio.com"


install.packages() does not pay attention to biocViews; the comment you 
reference is incorrect (here's the sole reference to 'biocViews' in the 
R source code: 
https://github.com/wch/r-source/blob/3137a19986dfa547eba59b46ed8dc02b0dbf888c/src/library/tools/R/utils.R#L1249)
Package names need to be unique across CRAN and Bioconductor, so there 
can only be one BiocInstaller. BiocInstaller as it exists in 
Bioconductor supports two different interfaces to package management 
already ('legacy' biocLite, plus more recent BiocInstaller::biocLite()) 
and has 15 years of code; it is better to start with a clean implementation.

Martin
This email message may contain legally privileged and/or...{{dropped:2}}
3 days later
#
Thanks for the feedback so far. If you're interested in trying out the 
new package, please

   remotes::install_github("Bioconductor/BiocManager")

and then

   BiocManager::install(version = "devel") # or 3.7 or 3.8

The essential functionality is

   BiocManager::install(<pkgs>)
   BiocManager::version()
   BiocManager::repositories()
   BiocManager::valid()

The documentation is intended to be up-to-date, with the exception of 
the vignette section where instructions for installation are as though 
the package were available on CRAN.

The package is a replacement for BiocInstaller (I removed BiocInstaller 
from my library) for either the release or devel version of Bioconductor.

Please post issues / pull requests / etc to the github repository

   https://github.com/Bioconductor/BiocManager

Martin
On 05/09/2018 06:11 PM, Martin Morgan wrote:
This email message may contain legally privileged and/or...{{dropped:2}}
3 days later
#
Hi,
On Wed, 2018-05-09 at 18:11 -0400, Martin Morgan wrote:
I'd like to challenge the concept of the release 
and the pretty strong term valid(). I think BioC 
is the only R package repository that has the release concept,
and this is good to have a consistent well tested environment 
of packages for a given R version. It is also great to pick 
the most recent package version within a release for installation,
but that also prevents installing a package newer than 
the one tied to the R version at hand. 

But R already has the concept of versioned dependencies,
so in theory we wouldn't /need/ the release concept. 

I'd like to suggest to make it easier to shoot yourself 
into the foot by installing less tested combinations 
of R and BioC packages, and to support installing

	BiocManager::install(X, version=0.8.15)
	BiocManager::install(X, release=3.8)
	BiocManager::install(X, release=devel) 

i.e. a package from a given BioC release, or a specific version 
of a package, regardless from which release it comes.
The valid() is a bit unspecific name, what is probably meant here 
is BiocManager::tainted(), which indicates that packages come 
from different BioC versions, and all hell might break loose 
and it'll eat your kittens.

This also means that Package developers would ask for the tainted() 
status in bug reports, and (c|sh)ould refuse help for unsupported 
combination. It also means that packagers would have to be more careful
with the versioned dependencies, and supply minimal versions. 
If they give a versioned dependency on the Biobase, 
they essentially forbid installing on old BioC releases, 
which would be fine. 

Just my 2c, 

Yours,
Steffen
#
On Fri, May 18, 2018 at 3:28 AM, Neumann, Steffen <sneumann at ipb-halle.de>
wrote:
I completely disagree with this idea. We have spent almost two decades now
trying to enforce the idea that we provide a coherent set of packages that
are guaranteed at some level to interoperate correctly. And biocLite was
the way we provided that. People have always had the opportunity to use
install.packages directly, or just go to the website and download any
package version they might like. Changing that now, to (in some sense)
officially allow people to install whatever versions they like is IMO both
short sighted and against the whole idea of having a coherent set of
packages.

It's not really difficult (like, at all) to install whatever version of
package you might want, but making it easy for naive users to mix-n-match
doesn't make any sense to me at all.

Best,

Jim

  
    
#
On 05/18/2018 03:28 AM, Neumann, Steffen wrote:
The version management in R itself is not up to this task, e.g., there 
is no transparent way to install archived packages and their dependencies

   https://hypatia.math.ethz.ch/pipermail/r-help/2018-May/454482.html

or to manage multiple versions of a single package

   https://support.bioconductor.org/p/108656/#108965

in base R.

There is no discipline amongst package developers to manage dependencies 
either initially or over the long tenure of a package in Bioconductor. 
This is not helped by open-ended promises like 'Biobase (>= x.y.z)', 
which is a very optimistic statement about the backward compatibility of 
future versions of Biobase.

So theory and practice are unfortunately diverged.
I guess there are so many ways to shoot oneself in the foot that it is a 
wonder that we still have feet! I'm with Jim on this one...
...and a minor thing is that striving for the positive result of valid() 
is more appealing than the negative of tainted().

And a plug for some feature creep I introduced the other day

   BiocManager::available("BSgenome.*(musculus|sapiens)")

Martin
This email message may contain legally privileged and/or...{{dropped:2}}
6 days later
#
Hi,
On Fri, 2018-05-18 at 10:00 -0400, Martin Morgan wrote:
Alright then, looking forward 
to the new BiocManager package.

Yours,
Steffen