Skip to content

[Bioc-devel] Short URLs for packages?

19 messages · Laurent Gatto, Sean Davis, Fischer, Bernd +6 more

#
I wonder whether it?d possible to have the website understand URLs like
	http://www.bioconductor.org/<pkgname>

This could resolve to 
http://www.bioconductor.org/packages/release/bioc/html/<pkgname>.html
or
http://www.bioconductor.org/packages/devel/bioc/html/<pkgname>.html
depending on whether the package was yet released.

This could be handy in papers or grants that mention packages.

Wolfgang


----
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
Genome Biology Unit
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

T +49-6221-3878823
wolfgang.huber at embl.de
http://www.huber.embl.de
#
On 23 March 2015 10:17, Wolfgang Huber wrote:

            
That's a great suggestion!

Maybe, there could still be short urls for release and devel pages;
something like

  http://www.bioconductor.org/rel/<pkgname>

and

  http://www.bioconductor.org/dev/<pkgname>


Laurent
#
Just so we don't lose the thoughts that have come before, here is a link to
a similar proposal from last year.

https://stat.ethz.ch/pipermail/bioc-devel/2014-February/005292.html

Sean
On Mon, Mar 23, 2015 at 6:17 AM, Wolfgang Huber <whuber at embl.de> wrote:

            

  
  
#
The cited htaccess rule just links the release version of the package. Since
this would already be an improvement, it is not sufficient for links in papers.

During the production process of the paper we want to link to the accompanying
BioC package that is in devel, but not yet in release. Before the first release, the
link (e.g. www.bioconductor.org/<packagename>) should go to the devel version
(maybe with an additional warning that it is only available in devel), before the 
first release of the package and should go to release afterwards.

Bernd
#
On Mon, Mar 23, 2015 at 8:05 AM, Fischer, Bernd <
b.fischer at dkfz-heidelberg.de> wrote:

            
I understand the appeal of this, but decoupling publications from the
actual, exact versions they discuss or use seems like a relatively large
step backwards in terms of reproducibility. At the very least, I think
there is some nuance here that warrants careful consideration before we
adopt a single-silently-changing-link-per-package paradigm.

~G

  
    
#
.../release/... silently changes every six months or so, as does .../devel/..., so I don't see how this changes anything beyond that.  It does make finding the packages a lot easier in general, and more mnemonic. 

If you want to document the versions of packages used in an analysis, there's always sessionInfo() and/or a dockerfile, rite?

--t
#
On Mon, Mar 23, 2015 at 9:00 AM, Tim Triche, Jr. <tim.triche at gmail.com>
wrote:
packages a lot easier in general, and more mnemonic.
It makes finding whatever the package is at the time you read the
publication easier, yes. Finding the software discussed or used in the
publication ... not really.

Packages are (read: should be, IMHO) published, citable pieces of research,
though. Imagine if a paper you cite were silently updated without the
doi/citation changing. That wouldn't be good
I guess my problem is that there is even an "if" at the beginning of that
sentence.  That's not an attack on you, I know that the above reflects the
current state of affairs, I'm simply saying that perhaps Bioconductor, as a
project, can help/encourage people to do better.

~G

  
    
#
I don't disagree, but the existing setup does nothing to address that. Citation('limma'), for example, does.

.../release/... and .../devel/... can change at any time, potentially overnight (with or without a new BioC release).  The only real way to cite an exact version is to cite that exact version, which is already the proper way to do things and would remain unaffected by this, at least AFAIK. 

Perhaps a useful addendum would be for the mnemonic 

http://bioconductor.org/limma 

To redirect to

http://bioconductor.org/packages/limma/whateverTheMostRecentStableVersionMayBe/

And then everything is explicit. 

Does that address the competing issues discussed herein?  

Best,

--t
#
Quite true.  Perhaps that could be emphasized as part of adding the redirect rules.  I am always delighted when people cite the version number of a package, as it shows that they care about the quality of their work, and the stability of its conclusions. 

People rarely do what they know is right; they do what is convenient, then repent.  (Bob Dylan pointed this out a while ago...)

Thus it is more likely that a person will do the right thing if it happens to be the most convenient thing IMHO. Anything to advance this strategy would be a step in the right direction 

Best,

--t

  
  
#
On March 23, 2015 9:18:57 AM PDT, "Tim Triche, Jr." <tim.triche at gmail.com> wrote:
Note that 'release' and 'devel' are just symlinks to the current release and devel versions. I.e. currently 3.0 and 3.1 respectively. So you can always link directly to a specific version. 

Dan
#
On Mon, Mar 23, 2015 at 9:25 AM, Tim Triche, Jr. <tim.triche at gmail.com>
wrote:
I agree. That is why I am somewhat leery of making more convenient to do
the "wrong" thing (or to not do the right one).
Bioc core team: This may be getting a bit off topic,  but has there been
any discussion of working with an organization like http://zenodo.org/ to
get DOIs assigned to Bioc packages on release? This could be for every
release or only for the initial inclusion in a Bioc release, but if they
are  version specific they would make citing, etc easier and more rigorous.
We could have a biocCite or figure out how to get citation to do the right
thing.

~G

  
    
#
I just meant that the mnemonic link

http://www.bioconductor.org/limma/  (SEO version of limma ;-))

could dump people at something like

http://www.bioconductor.org/release/limma/3.22.7/   (I'd prefer this)

or if need be for backwards compatibility,

http://www.bioconductor.org/packages/3.0/limma/3.22.7/  (seems less good)

instead of

http://www.bioconductor.org/packages/3.0/bioc/html/limma.html  (current)

and furthermore the specific version page could note more prominently that
the build of limma being referenced at that particular instance in time may
or may not be the same as was cited in a paper, used in an analysis,
available for download the previous evening, etc. thus citation("limma") is
a Very Good Idea when writing up results that depend upon it.  Because even
the WEHI guys could theoretically have a bug that impacted someone's
results (as opposed to the usual case of Didn't Read The Fine Limma Manual)

Does that make more sense?  (Probably not, but worth a try)

Statistics is the grammar of science.
Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science>

On Mon, Mar 23, 2015 at 9:29 AM, Dan Tenenbaum <dtenenba at fredhutch.org>
wrote:

  
  
#
Before we start a religious war, can we make progress on the pragmatic goal of making it possible to provide such URLs to people?

There are two concepts
- ?the package' - a specific version, running in a specific environment, ?frozen?, etc. (Gabe)
- ?the package? - as a concept and a living artifact (me, Bernd, Tim)
Both are useful. And having URLs for both would also be useful.

Wolfgang
#
On 03/24/2015 02:31 AM, Wolfgang Huber wrote:
0. That's (mostly) satisfied with the current scheme and

   http://bioconductor.org/packages/3.0/bioc/html/BiocGenerics.html
   http://bioconductor.org/packages/release/bioc/html/BiocGenerics.html
   http://bioconductor.org/packages/devel/bioc/html/BiocGenerics.html

(hey, no www. -- that's four letters already! Perhaps importantly, there's also 
a hard-coded version for devel, 3.1, and for past releases. So as I understand 
it the request is for (a) shorter path names and (b) dynamic selection of 
release vs. devel, mentioned below, for the <6 month period when the package is 
in devel but not yet release. Also noted is Henrik's earlier proposal mentioned 
by Sean.


1. 'packages', 'bioc', 'html' all look somehow redundant, so

   http://bioconductor.org/release/BiocGenerics.html
   http://bioconductor.org/devel/BiocGenerics.html
   http://bioconductor.org/3.0/BiocGenerics.html

but also

   http://bioconductor.org/release/BiocGenerics/ (implicit index.html)
   http://bioconductor.org/BiocGenerics/release/

and their devel and version counterparts would seem quite possible / not 
profoundly controversial. Landing pages for specific versions  3.22.7 do not 
currently exist, change little across package minor versions, and would not lead 
to packages installable via biocLite(), so this idea of Tim's is a non-starter 
in my opinion.

Having the 'version' level of the path before the package provides a logical 
place to put biocViews for that release. I'd vote for one of the 
release/BiocGenerics[.html] schemes.


2. Something like

   http://bioconductor.org/BiocGenerics

redirecting to release when available, devel when newly added (Wolfgang's 
proposal) would in my opinion be confusing, especially since we continue to have 
so much difficulty with version mismatches in user installations. I don't think 
having a warning on redirect that 'this package is not available for release' 
would be effective either at advertising robust software or at enabling use by 
comparatively naive users.


3. In terms of the 'redundant' parts of the path, these are not completely 
arbitrary (not that these considerations have to dictate presentation; they do 
make one suspect that 'add a redirect and everything will be fine' will result 
in a nice plate of spaghetti, especially if there is some desire to retain 
backward compatibility).

'packages' separates the package repository from other aspects of 
bioconductor.org, and group related concepts ('package', 'help', etc.) at a 
similar hierarchical level.

'bioc' serves to distinguish between software ('bioc/'), annotation 
('data/annotation') and experiment data ('data/experiment') packages, and these 
divide the overall repository into three for the purposes of biocLite() / 
install.packages() (this conceptual distinction has been useful, I think).

 > biocinstallRepos()
                                               BioCsoft
            "http://bioconductor.org/packages/3.1/bioc"
                                                BioCann
"http://bioconductor.org/packages/3.1/data/annotation"
                                                BioCexp
"http://bioconductor.org/packages/3.1/data/experiment"

'html' distinguishes the landing pages from the package tar balls / binary 
distributions themselves as returned by contrib.url(biocinstallRepos()), and 
from their vignette/, man/ and news/ resources.


4. In terms of best practices, it seems like articles are about particular 
versions and should cite the package as such, for instance if only in devel when 
the paper is being written as .../3.1/..., but that there is no substantive cost 
to also referencing 'current version available [after April, 2015] at 
.../release/....


5. At the end of the day I find myself casting my lot for landing pages with the 
form

   http://bioconductor.org/release/BiocGenerics/

which leads to a little less typing but not the dynamic resolution that started 
this (version) of the thread.


Martin

  
    
#
#5 is what I was thinking of when I responded.  A simple RewriteRule, if anyone still uses Apache. 

"Release" vs "devel" and/or "3.0" vs "3.1" vs "3.2", e.g.
Pointing analogously to
seems like a good minimal standard (project + version + package)


--t

  
  
#
But we already have dynamic resolution. Even http://bioconductor.org/release/BiocGenerics will point to different package versions (e.g. after bugfixes) as time goes by.
So the attribute ?release? is dynamically resolved. 
All I am asking for is another attribute that means ?the best that we currently have?, i.e. release if it exists and devel otherwise.

I didn?t expect so much disagreement on so mundane an issue. And there are plenty of ways of doing this outside the Bioc webpage, any of the public ?tiny URL? services, through my own webpage, or by just telling people to google the package name.
I don?t agree. This would mean that for each later version of the same package, even just after a trivial typo fix, there is either no article, or another one would have to be written. I don?t think this has an easily formalized solution, some good judgement is required.
E.g. try to apply the above reasoning to http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003118

Wolfgang
#
On Tue, Mar 24, 2015 at 7:28 AM, Wolfgang Huber <whuber at embl.de> wrote:

            
I just think there are a couple of subtleties here. I certainly don't
begrudge people wanting to type less and find packages easier. But if a
naive user with a default (read: release) Bioc installation goes to
http://bioconductor.org/CoolAwesomePkg and see's that it is "available in
bioconductor" but then can't install it because it is only in devel, are
they going to be less confused, or more? I don't know the answer to that,
but I think it's something to consider.

Also, as I have said elsewhere, though I acknowledge that you seem to
disagree, I think such urls are substantially less appropriate for
credit/citation in publications. A link that brought users to the version
in question, but which - if not current - had a prominent link to the
current version would be better imho.
I agree that there can be a bit of a "beard problem*" here. If people
follow the Bioc development guidelines, though, I think a pretty good rule
of thumb can be had: bugfix version changes (in the major.minor-bugfix
nomenclature) are (relatively unambiguously) the "same" software from a
publication standpoint, while package versions with minor or major version
differences are not. This doesn't mean that a new article need to be
written, imo, just that awareness that the article discussed a  different
version of the software - and that users should see the NEWS file or
current documentation  for fully up-to-date information - is important.

Not to harp on you personally, Wolfgang, because your paper with Simon
about DESeq was ahead of its time (and ours, sadly) on many of these
issues, but the API and default behavior of DESeq have changed
substantially (and for the better!) since its publication [1].

As a never-going-to-happen pipedream, this would be even more
straight-forward if Bioc package version numbers were of the form
(BiocVersion.PkgVersion-bugfix). Then the automatic incrementing of package
versions for bioc releases wouldn't muddy the waters here.


* The philosophical issue where some men obviously have beards, and some
men obviously don't, but there is no exact number of facial hairs at which
one unambiguously transitions from not having a beard to having one.

[1]
http://blog.revolutionanalytics.com/2014/08/gran-and-switchr-cant-send-you-back-in-time-but-they-can-send-r-sort-of.html

~G
#
There are still problems with completely reproducing old analyses partly
due to our (current) inability to reproduce an exact version (as Martin
says).

But I don't think we should muddle the waters and mix URL schemas with
versioning.

What Wolfgang is asking for is something I think makes total sense and
which I support: the ability to refer to a single url and get the "latest"
version of a package.  The url I usually give out is
../release/PACKAGE.html but that does not work for a package which has not
yet been part of a release.  Depending on your manuscript / package
development process I could easily see a manuscript getting accepted for
publication around the same time the package gets accepted into Bioc.  It
has happened to me.  And like Wolfgang, I don't like to have
../devel/PACKAGE.html links in my papers.

To me, this seems like a very slight extension to the
../release/PACKAGE.html schema, and I don't really understand the
reluctance to have this.

I am also happy to start a discussion on how to refer to specific versions
of a package including what we might need to support in Bioconductor to
achieve better reproducibility - which is what I think Gabe refers to - but
I don't think we should confuse this (important) issue with the schema
request.

Best,
Kasper
On Tue, Mar 24, 2015 at 11:15 AM, Gabe Becker <becker.gabe at gene.com> wrote:

            

  
  
#
I just think there are a couple of subtleties here. I certainly don't
begrudge people wanting to type less and find packages easier. But if a
naive user with a default (read: release) Bioc installation goes to
http://bioconductor.org/CoolAwesomePkg and see's that it is "available in
bioconductor" but then can't install it because it is only in devel, are
they going to be less confused, or more? I don't know the answer to that,
but I think it's something to consider.

We have (and already had many times) exactly this problem.
A paper is published and refers to a new BioC-package. The naive user
is not able to find the package. We want to show the naive user that this package
is indeed part of bioconductor and point him/her to a way to install the package.

The devel-webpage makes a clear statement on top saying
?This is the development version of BiocGenerics; for the stable release version, see MyPackage.?
If this is not prominent enough, one can highlight this with yellow color.


Also, as I have said elsewhere, though I acknowledge that you seem to
disagree, I think such urls are substantially less appropriate for
credit/citation in publications. A link that brought users to the version
in question, but which - if not current - had a prominent link to the
current version would be better imho.

This discussion is off-topic. The versioning system of Bioconductor provides
a sufficient way to cite the right version of the packages to ensure reproducible
research. We (try to) do this in the papers as well. We do not request that short
URLs should replace the correct citation of package versions.

Here, we ask for a stable, short URL that links to the most current, stable version
of the package (which is in devel for the time between acceptance and first
release of the package). Most users reading about a bioconductor package
want to install the current version of the package, that is best tested,with the
lowest number of bugs, installable on a current machine, with a current version
of R, ? .
We want to put a stable URL into a paper, that does not need to be
changed anymore, when the BioC-package moves from devel to release. There
is no way to change the paper after publication.

Bernd