Skip to content
Prev 20894 / 21316 Next

[Bioc-devel] Bioconductor archive?

Hi, all.

Perhaps a bit tangential, but I capture the results of all build reports for all packages daily (that is the intent, anyway) going back a year or so (a couple of years if we dig into archives). The reports are processed using code in this repo: https://github.com/seandavi/BiocBuildDB using a github action<https://github.com/seandavi/BiocBuildDB/actions/workflows/process_new_build_reports.yaml> that runs daily. This might not be exactly the format you are looking for, Lluis, but it does have a complete history of every build for every package for every day for all Bioc builds.

The result is a set of three CSV files (one set for every build, about 3.5k CSV files right now) with rows for each package/machine/build step and the results of the build, including propagation status (whether the package gets pushed to release). Version numbers, git hashes, dates, Bioconductor versions, build commands, error logs, etc. are all captured. Thus, things like full text search over captured log output is possible over time, across branches, and across machines or packages. When a package enters the system is captured. The build_summary table currently checks in at about 6M rows (again, without going into archive data) and adds about 20k rows per day.

I have pending issues<https://github.com/seandavi/BiocBuildDB/issues> to expose the data but just haven?t prioritized the work. I?m happy to discuss access and use cases either in a new thread here, on Slack, or via github issues.

Sean



From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of Llu?s Revilla <lluis.revilla at gmail.com>
Date: Wednesday, March 19, 2025 at 6:21?PM
To: Kern, Lori <Lori.Shepherd at roswellpark.org>
Cc: bioc-devel <bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] Bioconductor archive?
Hi Lori,

Many thanks for your answer. I have a couple of follow-up questions.
Thanks for reminding me of this. I'm interested on the x.y.z+n
packages that were released on each release, not just the last one or
the initial one. Is this historical information available? The file at
https://bioconductor.org/packages/3.20/bioc/VIEWS only includes the
latest date of a given release, but there could be a release within a
given Bioconductor version before that.
I thought it would be difficult if not impossible to check this but
even for the current release I can't find this data. Does Bioconductor
have an internal archive with this information? On CRAN even if it
removes a package internally the  activities of the archive are
stored: each date-time of publication, archive and removal. Does
something similar happen in Bioconductor? Even if a given package is
not available knowing that there was a release could be helpful for
reproducibility (as it could be used to compare with the git log).

With that information finding which package versions were used for a
script with only a date could become easier.

Best,

Llu?s
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
Message-ID: <MN2PR10MB32303B0B76BB781E5AE056DCF9D82@MN2PR10MB3230.namprd10.prod.outlook.com>
In-Reply-To: <CAN+W6_v9KJK1v2M_LEh=SRVE5wHs1O4dNt7zbaLhfJdusJUzmA@mail.gmail.com>