Skip to content

[Bioc-devel] Bioconductor archive?

5 messages · Lluís Revilla, Shepherd, Lori, Sean Davis

#
Hi,

Recently I learned thanks to Martin Morgan that there are some files with
the Date/Publication fields for Bioconductor packages:
https://bioconductor.org/packages/3.7/bioc/VIEWS. I'm trying to reconstruct
which packages from  CRAN and Biocondctor were available at any moment and
it was very helpful.

However, these files have the latest version published by a package on a
given Bioconductor release.
Is there a way to know if there were more updates after a release?
I thought about searching the git log for each package. But that wouldn't
be enough, as they might have increased their version but not passed
Bioconductor checks, and thus not be released.

Related to this, this field is present from Bioconductor version 3.7 or
later but I couldn't find it on previous releases. Is there a way to know
previous packages' releases and their dates?

Packages' updates on the release branch should on contain bug fixes, but
for reproducibility purposes it might be necessary to get the same bugs
again.

Many thanks in advance,

Llu?s
2 days later
#
It looks like the Date/Publication field is only present when there was a change on the branch post release.   (ie. any package that has a version x.y.(z+n) instead of x.y.0.
After a release is frozen and a new release occurs, Bioconductor does not allow any changes or fixes even to bugs.  A release is frozen so there is no changes after the new release occurs.
I would have to dig in the history but my guess is 3.7 might be when we either switched to git or started having archived versions so likely not available before this date.




Lori Shepherd - Kern

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263
2 days later
#
Hi Lori,

Many thanks for your answer. I have a couple of follow-up questions.
Thanks for reminding me of this. I'm interested on the x.y.z+n
packages that were released on each release, not just the last one or
the initial one. Is this historical information available? The file at
https://bioconductor.org/packages/3.20/bioc/VIEWS only includes the
latest date of a given release, but there could be a release within a
given Bioconductor version before that.
I thought it would be difficult if not impossible to check this but
even for the current release I can't find this data. Does Bioconductor
have an internal archive with this information? On CRAN even if it
removes a package internally the  activities of the archive are
stored: each date-time of publication, archive and removal. Does
something similar happen in Bioconductor? Even if a given package is
not available knowing that there was a release could be helpful for
reproducibility (as it could be used to compare with the git log).

With that information finding which package versions were used for a
script with only a date could become easier.

Best,

Llu?s
#
Hi, all.

Perhaps a bit tangential, but I capture the results of all build reports for all packages daily (that is the intent, anyway) going back a year or so (a couple of years if we dig into archives). The reports are processed using code in this repo: https://github.com/seandavi/BiocBuildDB using a github action<https://github.com/seandavi/BiocBuildDB/actions/workflows/process_new_build_reports.yaml> that runs daily. This might not be exactly the format you are looking for, Lluis, but it does have a complete history of every build for every package for every day for all Bioc builds.

The result is a set of three CSV files (one set for every build, about 3.5k CSV files right now) with rows for each package/machine/build step and the results of the build, including propagation status (whether the package gets pushed to release). Version numbers, git hashes, dates, Bioconductor versions, build commands, error logs, etc. are all captured. Thus, things like full text search over captured log output is possible over time, across branches, and across machines or packages. When a package enters the system is captured. The build_summary table currently checks in at about 6M rows (again, without going into archive data) and adds about 20k rows per day.

I have pending issues<https://github.com/seandavi/BiocBuildDB/issues> to expose the data but just haven?t prioritized the work. I?m happy to discuss access and use cases either in a new thread here, on Slack, or via github issues.

Sean



From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Llu?s Revilla <lluis.revilla at gmail.com>
Date: Wednesday, March 19, 2025 at 6:21?PM
To: Kern, Lori <Lori.Shepherd at roswellpark.org>
Cc: bioc-devel <bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] Bioconductor archive?
Hi Lori,

Many thanks for your answer. I have a couple of follow-up questions.
Thanks for reminding me of this. I'm interested on the x.y.z+n
packages that were released on each release, not just the last one or
the initial one. Is this historical information available? The file at
https://bioconductor.org/packages/3.20/bioc/VIEWS only includes the
latest date of a given release, but there could be a release within a
given Bioconductor version before that.
I thought it would be difficult if not impossible to check this but
even for the current release I can't find this data. Does Bioconductor
have an internal archive with this information? On CRAN even if it
removes a package internally the  activities of the archive are
stored: each date-time of publication, archive and removal. Does
something similar happen in Bioconductor? Even if a given package is
not available knowing that there was a release could be helpful for
reproducibility (as it could be used to compare with the git log).

With that information finding which package versions were used for a
script with only a date could become easier.

Best,

Llu?s
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
#
Thanks Sean,

This looks awesome! Many thanks for storing this. I'll see how I could
process the data and might contact you off-list or via the issues on
the repo.

Just by the numbers reported I'm a bit surprised by the daily
increment of the summary table. Bioconductor software has around 2000
packages, checked on 5 different machines, per 5 outputs (Install,
build, check, bin, propagate) (which results on that order of
magnitudes), but not all builds and checks are run everyday (now I
cannot find the page where the frequency is reported).

At the moment I won't use build and check reports but I might be
interested in that later (I too collect general checks results from
CRAN without the log files).
In any case, I'll get in touch.
Ideally, I would like to export/use this from a package, as I have
done for CRAN via the repo.data package I'm building.

Best wishes and many thanks,

Llu?s
On Thu, 20 Mar 2025 at 02:56, Sean Davis <seandavi at gmail.com> wrote: