Dear Dan, Dear developers list, Due a recent change in one cran package, DEXSeq 1.8.0 (for the R version 3.0.*) stop working. I fixed this conflict in the release branch of bioconductor and tried to commit my changes. But I don't seem to have write access, e.g: $ svn ci --username a.reyes -m "fixed conflicts with newest version of cran package" Sending DESCRIPTION svn: Commit failed (details follow): svn: access to '/bioconductor/!svn/ver/81643/branches/RELEASE_2_13/madman/Rpacks/DEXSeq/DESCRIPTION' forbidden I also noticed that I also don't have read access... svn co --username a.reyes https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_2_13/madman/Rpacks/DEXSeq svn: access to 'https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_2_13/madman/Rpacks/DEXSeq' forbidden I was wondering if this intentional? If so, what would be the way to solve this kind of problems (e.g. a dependency changing outside bioconductor that breaks previous versions of a bioconductor package)? Best regards, Alejandro
[Bioc-devel] r+w permissions in release branches
18 messages · Kasper Daniel Hansen, Andrzej Oleś, Alejandro Reyes +6 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20140422/c4e18442/attachment.pl>
Dear Kasper, regarding your issue with R-2.15: I was wondering whether using an older version of Rcpp from http://cran.r-project.org/src/contrib/Archive/Rcpp/ would help? Cheers, Andrzej On Tue, Apr 22, 2014 at 2:46 PM, Kasper Daniel Hansen
<kasperdanielhansen at gmail.com> wrote:
This is because commits to this branch of Bioconductor has been disabled and it is intentional. But it raises the larger question, recently touched upon in a lengthy thread on R-devel, on whether this is a good state of affairs for Bioconductor. Specifically the issue has to do with dependency of a Bioconductor package on a CRAN package and what happens when CRAN packages gets updated in a way that breaks backwards compability. Right now, we (Bioconductor) might get hosed. For example, we recently deployed a new computing cluster here at Hopkins. I maintain our R installation and some users have asked for an install of Bioconductor using the latest version of R-2.15, for reproducibility reasons. I have a number of scripts which installs a standard suite of packages we use here. The issue I am facing is that Rcpp has been updated and does not seem to be available for this version of R. This indirectly breaks crlmm, lumi, minfi, charm, methylumi, bead array, arrayQualityMetrics to mention but a few we use on our end. This seems somewhat undesirable from a reproducibility perspective - I cannot even install the packages! Best, Kasper On Tue, Apr 22, 2014 at 2:19 PM, Alejandro Reyes <alejandro.reyes at embl.de>wrote:
Dear Dan, Dear developers list, Due a recent change in one cran package, DEXSeq 1.8.0 (for the R version 3.0.*) stop working. I fixed this conflict in the release branch of bioconductor and tried to commit my changes. But I don't seem to have write access, e.g: $ svn ci --username a.reyes -m "fixed conflicts with newest version of cran package" Sending DESCRIPTION svn: Commit failed (details follow): svn: access to '/bioconductor/!svn/ver/81643/branches/RELEASE_2_13/madman/Rpacks/DEXSeq/DESCRIPTION' forbidden I also noticed that I also don't have read access... svn co --username a.reyes https://hedgehog.fhcrc.org/ bioconductor/branches/RELEASE_2_13/madman/Rpacks/DEXSeq svn: access to 'https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_ 2_13/madman/Rpacks/DEXSeq' forbidden I was wondering if this intentional? If so, what would be the way to solve this kind of problems (e.g. a dependency changing outside bioconductor that breaks previous versions of a bioconductor package)? Best regards, Alejandro
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hi Andrej, Yes, that would help, that would be also a solution for my case, installing an old version of the cran package (stamod in my case) However, I don't know if this could be a "general solution for all users" since when installing a package via biocLite, the latest version of the cran package is installed regardless the R/BiocInstaller version you are using. Users would need to download the versions of the dependencies that they need and install them manually. Alejandro
Dear Kasper, regarding your issue with R-2.15: I was wondering whether using an older version of Rcpp from http://cran.r-project.org/src/contrib/Archive/Rcpp/ would help? Cheers, Andrzej On Tue, Apr 22, 2014 at 2:46 PM, Kasper Daniel Hansen <kasperdanielhansen at gmail.com> wrote:
This is because commits to this branch of Bioconductor has been disabled and it is intentional. But it raises the larger question, recently touched upon in a lengthy thread on R-devel, on whether this is a good state of affairs for Bioconductor. Specifically the issue has to do with dependency of a Bioconductor package on a CRAN package and what happens when CRAN packages gets updated in a way that breaks backwards compability. Right now, we (Bioconductor) might get hosed. For example, we recently deployed a new computing cluster here at Hopkins. I maintain our R installation and some users have asked for an install of Bioconductor using the latest version of R-2.15, for reproducibility reasons. I have a number of scripts which installs a standard suite of packages we use here. The issue I am facing is that Rcpp has been updated and does not seem to be available for this version of R. This indirectly breaks crlmm, lumi, minfi, charm, methylumi, bead array, arrayQualityMetrics to mention but a few we use on our end. This seems somewhat undesirable from a reproducibility perspective - I cannot even install the packages! Best, Kasper On Tue, Apr 22, 2014 at 2:19 PM, Alejandro Reyes <alejandro.reyes at embl.de>wrote:
Dear Dan, Dear developers list, Due a recent change in one cran package, DEXSeq 1.8.0 (for the R version 3.0.*) stop working. I fixed this conflict in the release branch of bioconductor and tried to commit my changes. But I don't seem to have write access, e.g: $ svn ci --username a.reyes -m "fixed conflicts with newest version of cran package" Sending DESCRIPTION svn: Commit failed (details follow): svn: access to '/bioconductor/!svn/ver/81643/branches/RELEASE_2_13/madman/Rpacks/DEXSeq/DESCRIPTION' forbidden I also noticed that I also don't have read access... svn co --username a.reyes https://hedgehog.fhcrc.org/ bioconductor/branches/RELEASE_2_13/madman/Rpacks/DEXSeq svn: access to 'https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_ 2_13/madman/Rpacks/DEXSeq' forbidden I was wondering if this intentional? If so, what would be the way to solve this kind of problems (e.g. a dependency changing outside bioconductor that breaks previous versions of a bioconductor package)? Best regards, Alejandro
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hi, For a "more general solution" one could think of specifying the version of critical packages in the 'description' file and having a 'biocLite' function that installs the specific version from CRAN. See e.g. the 'devtools::install_version' function for installing older packages from the CRAN archive. This may have drawbacks for binary or compiled packages though. Best wishes Julian
On 22.04.2014 15:31, Alejandro Reyes wrote:
However, I don't know if this could be a "general solution for all users" since when installing a package via biocLite, the latest version of the cran package is installed regardless the R/BiocInstaller version you are using. Users would need to download the versions of the dependencies that they need and install them manually.
Hi Julian what if two Bioc packages require different version of the ?same? CRAN package? AfaIu, the infrastructure is not designed to deal with multiple versions of a package. Nor would I as a user expect to have less-than-the-most recent versions of CRAN packages in my library just because some other package says so? Just to throw in another, and probably silly suggestion: the Bioconductor repository could keep ?snapshots? of CRAN packages compatible with each release, but they would have to be name-mangled in some way. The potential for confusion is enormous. Best wishes Wolfgang Il giorno 22 Apr 2014, alle ore 16:14, Julian Gehring <julian.gehring at embl.de> ha scritto:
Hi, For a "more general solution" one could think of specifying the version of critical packages in the 'description' file and having a 'biocLite' function that installs the specific version from CRAN. See e.g. the 'devtools::install_version' function for installing older packages from the CRAN archive. This may have drawbacks for binary or compiled packages though. Best wishes Julian On 22.04.2014 15:31, Alejandro Reyes wrote:
However, I don't know if this could be a "general solution for all users" since when installing a package via biocLite, the latest version of the cran package is installed regardless the R/BiocInstaller version you are using. Users would need to download the versions of the dependencies that they need and install them manually.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hi, For most problems discussed here, it seems that having a fixed version of package is sufficient rather than a specific version. If the idea of a snapshot with each bioc release would work (which still means one version per package), so would requiring that version within the package (one would just need to agree which version this is). Best wishes Julian
what if two Bioc packages require different version of the ?same? CRAN package? AfaIu, the infrastructure is not designed to deal with multiple versions of a package. Nor would I as a user expect to have less-than-the-most recent versions of CRAN packages in my library just because some other package says so? Just to throw in another, and probably silly suggestion: the Bioconductor repository could keep ?snapshots? of CRAN packages compatible with each release, but they would have to be name-mangled in some way. The potential for confusion is enormous.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20140422/ad6121e6/attachment.pl>
1 day later
On 04/22/2014 09:47 AM, Kasper Daniel Hansen wrote:
I think we should have a CRAN snapshot (or a subset of CRAN used in Bioc) inside each Bioc release; I don't know how hard that is to manage from a technical point of view.
I followed this thread with some interest. It would be surprisingly challenging to update even a 2.13 package -- the build machines have moved on to other tasks, unconstrained by the unique system dependencies needed for 2.13 builds. The idea of a 'forever' repository snapshot seems possible, but would the snapshot be at the beginning of the release and hence miss the few but important bug fixes introduced during the release, or at the end of the release, which might be after the time required for the purposes of replication? Either way it is certain that the peanut butter would land face down for one's particular need. Also, the need for the user to satisfy system dependencies becomes increasingly challenging, even with a binary repository. I don't think a central 'Bioc' solution would really address the problem of reproducibility. It is not that 'hard' for an individual group to create a snapshot of Bioc and CRAN, using rsync http://www.bioconductor.org/about/mirrors/mirror-how-to/ http://cran.r-project.org/mirror-howto.html? and to use install.packages() or even biocLite to access these (see ?setRepositories). This would again require that the system dependencies for these packages are satisfied in some kind of frozen fashion. A more robust possibility is of course a virtual machine, such as the AMI (or a customized version) we provide http://www.bioconductor.org/help/bioconductor-cloud-ami/#ami_ids although these have only a subset of packages installed by default. The CRAN thread referenced earlier included this post https://stat.ethz.ch/pipermail/r-devel/2014-March/068605.html which I think makes an important distinction between exact replication and scientific reproducibility; it is the latter that must be the most interesting, and the former that we somehow seem to stumble over. The thread also mentions best practices -- version control http://bioconductor.org/developers/how-to/source-control/ disciplined approach to deprecation http://bioconductor.org/developers/how-to/deprecation/ package versioning http://bioconductor.org/developers/how-to/version-numbering/ and the Bioc-style approach to release that we as developers can act on to enhance reproducibility. What other best practices can we more forcefully / conveniently adopt within the project? Martin
Best, Kasper On Tue, Apr 22, 2014 at 6:06 PM, Julian Gehring <julian.gehring at embl.de>wrote:
Hi, For most problems discussed here, it seems that having a fixed version of package is sufficient rather than a specific version. If the idea of a snapshot with each bioc release would work (which still means one version per package), so would requiring that version within the package (one would just need to agree which version this is). Best wishes Julian what if two Bioc packages require different version of the ???same??? CRAN
package? AfaIu, the infrastructure is not designed to deal with multiple versions of a package. Nor would I as a user expect to have less-than-the-most recent versions of CRAN packages in my library just because some other package says so??? Just to throw in another, and probably silly suggestion: the Bioconductor repository could keep ???snapshots??? of CRAN packages compatible with each release, but they would have to be name-mangled in some way. The potential for confusion is enormous.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
Hi Martin to come back to the original trigger for this thread: it was not concerns for reproducibility, but the fact that a Bioc package in the current release stopped working because a CRAN package has changed in the meanwhile. What?s the most practical solution to this specific problem? Best wishes Wolfgang
On 23 Apr 2014, at 19:41, Martin Morgan <mtmorgan at fhcrc.org> wrote:
On 04/22/2014 09:47 AM, Kasper Daniel Hansen wrote:
I think we should have a CRAN snapshot (or a subset of CRAN used in Bioc) inside each Bioc release; I don't know how hard that is to manage from a technical point of view.
I followed this thread with some interest. It would be surprisingly challenging to update even a 2.13 package -- the build machines have moved on to other tasks, unconstrained by the unique system dependencies needed for 2.13 builds. The idea of a 'forever' repository snapshot seems possible, but would the snapshot be at the beginning of the release and hence miss the few but important bug fixes introduced during the release, or at the end of the release, which might be after the time required for the purposes of replication? Either way it is certain that the peanut butter would land face down for one's particular need. Also, the need for the user to satisfy system dependencies becomes increasingly challenging, even with a binary repository. I don't think a central 'Bioc' solution would really address the problem of reproducibility. It is not that 'hard' for an individual group to create a snapshot of Bioc and CRAN, using rsync http://www.bioconductor.org/about/mirrors/mirror-how-to/ http://cran.r-project.org/mirror-howto.html? and to use install.packages() or even biocLite to access these (see ?setRepositories). This would again require that the system dependencies for these packages are satisfied in some kind of frozen fashion. A more robust possibility is of course a virtual machine, such as the AMI (or a customized version) we provide http://www.bioconductor.org/help/bioconductor-cloud-ami/#ami_ids although these have only a subset of packages installed by default. The CRAN thread referenced earlier included this post https://stat.ethz.ch/pipermail/r-devel/2014-March/068605.html which I think makes an important distinction between exact replication and scientific reproducibility; it is the latter that must be the most interesting, and the former that we somehow seem to stumble over. The thread also mentions best practices -- version control http://bioconductor.org/developers/how-to/source-control/ disciplined approach to deprecation http://bioconductor.org/developers/how-to/deprecation/ package versioning http://bioconductor.org/developers/how-to/version-numbering/ and the Bioc-style approach to release that we as developers can act on to enhance reproducibility. What other best practices can we more forcefully / conveniently adopt within the project? Martin
Best, Kasper On Tue, Apr 22, 2014 at 6:06 PM, Julian Gehring <julian.gehring at embl.de>wrote:
Hi, For most problems discussed here, it seems that having a fixed version of package is sufficient rather than a specific version. If the idea of a snapshot with each bioc release would work (which still means one version per package), so would requiring that version within the package (one would just need to agree which version this is). Best wishes Julian what if two Bioc packages require different version of the ???same??? CRAN
package? AfaIu, the infrastructure is not designed to deal with multiple versions of a package. Nor would I as a user expect to have less-than-the-most recent versions of CRAN packages in my library just because some other package says so??? Just to throw in another, and probably silly suggestion: the Bioconductor repository could keep ???snapshots??? of CRAN packages compatible with each release, but they would have to be name-mangled in some way. The potential for confusion is enormous.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20140424/1f2c813a/attachment.pl>
Hi Kasper you are right, I had misunderstood the problem. In that case I agree with Martin that the problem resolves into components that are either intractable, already addressed by deprecation policies, or not very important. Sorry for the noise. Wolfgang
On 24 Apr 2014, at 15:18, Kasper Daniel Hansen <kasperdanielhansen at gmail.com> wrote:
Wolfgang,
Alejandro did not have a problem with the current release, but with the most recent prior release. His issue is precisely because it is no longer the current (stable) release.
Kasper
On Thu, Apr 24, 2014 at 3:05 PM, Wolfgang Huber <whuber at embl.de> wrote:
Hi Martin
to come back to the original trigger for this thread: it was not concerns for reproducibility, but the fact that a Bioc package in the current release stopped working because a CRAN package has changed in the meanwhile.
What?s the most practical solution to this specific problem?
Best wishes
Wolfgang
On 23 Apr 2014, at 19:41, Martin Morgan <mtmorgan at fhcrc.org> wrote:
On 04/22/2014 09:47 AM, Kasper Daniel Hansen wrote:
I think we should have a CRAN snapshot (or a subset of CRAN used in Bioc) inside each Bioc release; I don't know how hard that is to manage from a technical point of view.
I followed this thread with some interest. It would be surprisingly challenging to update even a 2.13 package -- the build machines have moved on to other tasks, unconstrained by the unique system dependencies needed for 2.13 builds. The idea of a 'forever' repository snapshot seems possible, but would the snapshot be at the beginning of the release and hence miss the few but important bug fixes introduced during the release, or at the end of the release, which might be after the time required for the purposes of replication? Either way it is certain that the peanut butter would land face down for one's particular need. Also, the need for the user to satisfy system dependencies becomes increasingly challenging, even with a binary repository. I don't think a central 'Bioc' solution would really address the problem of reproducibility. It is not that 'hard' for an individual group to create a snapshot of Bioc and CRAN, using rsync http://www.bioconductor.org/about/mirrors/mirror-how-to/ http://cran.r-project.org/mirror-howto.html? and to use install.packages() or even biocLite to access these (see ?setRepositories). This would again require that the system dependencies for these packages are satisfied in some kind of frozen fashion. A more robust possibility is of course a virtual machine, such as the AMI (or a customized version) we provide http://www.bioconductor.org/help/bioconductor-cloud-ami/#ami_ids although these have only a subset of packages installed by default. The CRAN thread referenced earlier included this post https://stat.ethz.ch/pipermail/r-devel/2014-March/068605.html which I think makes an important distinction between exact replication and scientific reproducibility; it is the latter that must be the most interesting, and the former that we somehow seem to stumble over. The thread also mentions best practices -- version control http://bioconductor.org/developers/how-to/source-control/ disciplined approach to deprecation http://bioconductor.org/developers/how-to/deprecation/ package versioning http://bioconductor.org/developers/how-to/version-numbering/ and the Bioc-style approach to release that we as developers can act on to enhance reproducibility. What other best practices can we more forcefully / conveniently adopt within the project? Martin
Best, Kasper On Tue, Apr 22, 2014 at 6:06 PM, Julian Gehring <julian.gehring at embl.de>wrote:
Hi, For most problems discussed here, it seems that having a fixed version of package is sufficient rather than a specific version. If the idea of a snapshot with each bioc release would work (which still means one version per package), so would requiring that version within the package (one would just need to agree which version this is). Best wishes Julian what if two Bioc packages require different version of the ???same??? CRAN
package? AfaIu, the infrastructure is not designed to deal with multiple versions of a package. Nor would I as a user expect to have less-than-the-most recent versions of CRAN packages in my library just because some other package says so??? Just to throw in another, and probably silly suggestion: the Bioconductor repository could keep ???snapshots??? of CRAN packages compatible with each release, but they would have to be name-mangled in some way. The potential for confusion is enormous.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20140424/4ef07505/attachment.pl>
Hi, See the latest software builds for BioC 2.13: http://bioconductor.org/checkResults/2.13/bioc-20140405/ The number of packages that needed to be installed on the build system in order to build and check the 750 BioC software packages is displayed in the right-most column of the top table: 1510 on zin1 (Linux) 1486 on moscato1 (Windows) 1500 on perceval (Mac) If you click on these numbers, you get the full list of packages plus their version. Once you've subtracted the 750 software packages + the number of data annotation and data experiment packages (a few more hundreds) from these numbers, that gives you the number of CRAN packages that BioC 2.13 depends on. Not that many really (only a very small fraction of the 5400 CRAN packages). If we hosted only this small subset of CRAN packages under http://bioconductor.org/packages/2.13/cran next to the other 4 frozen repos http://bioconductor.org/packages/2.13/bioc http://bioconductor.org/packages/2.13/data/annotation http://bioconductor.org/packages/2.13/experiment http://bioconductor.org/packages/2.13/extra and have biocLite() modified to point to http://bioconductor.org/packages/2.13/cran instead of http://cran.fhcrc.org then anybody that has R 3.0.3 could *easily* install and run BioC 2.13 now or in 5 years from now. Cheers, H.
On 04/24/2014 08:09 AM, Steve Lianoglou wrote:
Hi all, Just saw this tangentially related link to "packrat" which seems something analogous to a virtualenv (of sorts) for R by the Rstudio folks, which I thought might be useful It actually doesn't solve anybody's problem here, but as I said ... tangential :-) http://rstudio.github.io/packrat/ On Thursday, April 24, 2014, Wolfgang Huber <whuber at embl.de> wrote:
Hi Kasper
you are right, I had misunderstood the problem.
In that case I agree with Martin that the problem resolves into components
that are either intractable, already addressed by deprecation policies, or
not very important.
Sorry for the noise.
Wolfgang
On 24 Apr 2014, at 15:18, Kasper Daniel Hansen <
kasperdanielhansen at gmail.com> wrote:
Wolfgang, Alejandro did not have a problem with the current release, but with the
most recent prior release. His issue is precisely because it is no longer the current (stable) release.
Kasper On Thu, Apr 24, 2014 at 3:05 PM, Wolfgang Huber <whuber at embl.de> wrote: Hi Martin to come back to the original trigger for this thread: it was not
concerns for reproducibility, but the fact that a Bioc package in the current release stopped working because a CRAN package has changed in the meanwhile.
What's the most practical solution to this specific problem?
Best wishes
Wolfgang
On 23 Apr 2014, at 19:41, Martin Morgan <mtmorgan at fhcrc.org> wrote:
On 04/22/2014 09:47 AM, Kasper Daniel Hansen wrote:
I think we should have a CRAN snapshot (or a subset of CRAN used in
Bioc)
inside each Bioc release; I don't know how hard that is to manage
from a
technical point of view.
I followed this thread with some interest. It would be surprisingly challenging to update even a 2.13 package --
the build machines have moved on to other tasks, unconstrained by the unique system dependencies needed for 2.13 builds.
The idea of a 'forever' repository snapshot seems possible, but would
the snapshot be at the beginning of the release and hence miss the few but important bug fixes introduced during the release, or at the end of the release, which might be after the time required for the purposes of replication? Either way it is certain that the peanut butter would land face down for one's particular need. Also, the need for the user to satisfy system dependencies becomes increasingly challenging, even with a binary repository. I don't think a central 'Bioc' solution would really address the problem of reproducibility.
It is not that 'hard' for an individual group to create a snapshot of
Bioc and CRAN, using rsync
http://www.bioconductor.org/about/mirrors/mirror-how-to/ http://cran.r-project.org/mirror-howto.html and to use install.packages() or even biocLite to access these (see
?setRepositories). This would again require that the system dependencies for these packages are satisfied in some kind of frozen fashion.
A more robust possibility is of course a virtual machine, such as the
AMI (or a customized version) we provide
http://www.bioconductor.org/help/bioconductor-cloud-ami/#ami_ids although these have only a subset of packages installed by default. The CRAN thread referenced earlier included this post https://stat.ethz.ch/pipermail/r-devel/2014-March/068605.html which I think makes an important distinction between exact replication
and scientific reproducibility; it is the latter that must be the most interesting, and the former that we somehow seem to stumble over. The thread also mentions best practices -- version control
http://bioconductor.org/developers/how-to/source-control/ disciplined approach to deprecation http://bioconductor.org/developers/how-to/deprecation/ package versioning http://bioconductor.org/developers/how-to/version-numbering/ and the Bioc-style approach to release that we as developers can act
on to enhance reproducibility. What other best pract
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
5 days later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20140430/cc861b10/attachment.pl>
On 04/30/2014 05:30 PM, Kasper Daniel Hansen wrote:
Let me add my opinion: we do not have perfect (easy) reproducibility with Bioc because we can only (easily) download the tar ball corresponding to the latest commit in a given branch. I am ok with that. What I (and Alejandro) is concerned about is the inability to install even that.
There is a clear candidate for which version of the CRAN package we should store: the version we use when we run R CMD check. This is the version we implicitly say things are working with.
We discussed this internally and are likely to create snapshots at the end of each release cycle of all Bioc packages and their CRAN dependencies. Perhaps these will be available too as an AMI. A snapshot facilitates (though hardly guarantees) reproducibility without too much cost, and is consistent with project objectives. Martin
Best, Kasper On Fri, Apr 25, 2014 at 7:41 AM, Herv?? Pag??s <hpages at fhcrc.org> wrote:
Hi, See the latest software builds for BioC 2.13: http://bioconductor.org/checkResults/2.13/bioc-20140405/ The number of packages that needed to be installed on the build system in order to build and check the 750 BioC software packages is displayed in the right-most column of the top table: 1510 on zin1 (Linux) 1486 on moscato1 (Windows) 1500 on perceval (Mac) If you click on these numbers, you get the full list of packages plus their version. Once you've subtracted the 750 software packages + the number of data annotation and data experiment packages (a few more hundreds) from these numbers, that gives you the number of CRAN packages that BioC 2.13 depends on. Not that many really (only a very small fraction of the 5400 CRAN packages). If we hosted only this small subset of CRAN packages under http://bioconductor.org/packages/2.13/cran next to the other 4 frozen repos http://bioconductor.org/packages/2.13/bioc http://bioconductor.org/packages/2.13/data/annotation http://bioconductor.org/packages/2.13/experiment http://bioconductor.org/packages/2.13/extra and have biocLite() modified to point to http://bioconductor.org/packages/2.13/cran instead of http://cran.fhcrc.org then anybody that has R 3.0.3 could *easily* install and run BioC 2.13 now or in 5 years from now. Cheers, H. On 04/24/2014 08:09 AM, Steve Lianoglou wrote:
Hi all, Just saw this tangentially related link to "packrat" which seems something analogous to a virtualenv (of sorts) for R by the Rstudio folks, which I thought might be useful It actually doesn't solve anybody's problem here, but as I said ... tangential :-) http://rstudio.github.io/packrat/ On Thursday, April 24, 2014, Wolfgang Huber <whuber at embl.de> wrote: Hi Kasper
you are right, I had misunderstood the problem.
In that case I agree with Martin that the problem resolves into
components
that are either intractable, already addressed by deprecation policies,
or
not very important.
Sorry for the noise.
Wolfgang
On 24 Apr 2014, at 15:18, Kasper Daniel Hansen <
kasperdanielhansen at gmail.com> wrote:
Wolfgang,
Alejandro did not have a problem with the current release, but with the
most recent prior release. His issue is precisely because it is no longer the current (stable) release.
Kasper On Thu, Apr 24, 2014 at 3:05 PM, Wolfgang Huber <whuber at embl.de> wrote: Hi Martin to come back to the original trigger for this thread: it was not
concerns for reproducibility, but the fact that a Bioc package in the current release stopped working because a CRAN package has changed in the meanwhile.
What's the most practical solution to this specific problem?
Best wishes
Wolfgang
On 23 Apr 2014, at 19:41, Martin Morgan <mtmorgan at fhcrc.org> wrote:
On 04/22/2014 09:47 AM, Kasper Daniel Hansen wrote:
I think we should have a CRAN snapshot (or a subset of CRAN used in
Bioc)
inside each Bioc release; I don't know how hard that is to manage
from a
technical point of view.
I followed this thread with some interest. It would be surprisingly challenging to update even a 2.13 package --
the build machines have moved on to other tasks, unconstrained by the
unique system dependencies needed for 2.13 builds.
The idea of a 'forever' repository snapshot seems possible, but would
the snapshot be at the beginning of the release and hence miss the few
but important bug fixes introduced during the release, or at the end of the release, which might be after the time required for the purposes of replication? Either way it is certain that the peanut butter would land face down for one's particular need. Also, the need for the user to satisfy system dependencies becomes increasingly challenging, even with a binary repository. I don't think a central 'Bioc' solution would really address the problem of reproducibility.
It is not that 'hard' for an individual group to create a snapshot of
Bioc and CRAN, using rsync
http://www.bioconductor.org/about/mirrors/mirror-how-to/ http://cran.r-project.org/mirror-howto.html and to use install.packages() or even biocLite to access these (see
?setRepositories). This would again require that the system dependencies
for these packages are satisfied in some kind of frozen fashion.
A more robust possibility is of course a virtual machine, such as the
AMI (or a customized version) we provide
http://www.bioconductor.org/help/bioconductor-cloud-ami/#ami_ids although these have only a subset of packages installed by default. The CRAN thread referenced earlier included this post https://stat.ethz.ch/pipermail/r-devel/2014-March/068605.html which I think makes an important distinction between exact replication
and scientific reproducibility; it is the latter that must be the most
interesting, and the former that we somehow seem to stumble over. The thread also mentions best practices -- version control
http://bioconductor.org/developers/how-to/source-control/ disciplined approach to deprecation http://bioconductor.org/developers/how-to/deprecation/ package versioning http://bioconductor.org/developers/how-to/version-numbering/ and the Bioc-style approach to release that we as developers can act
on to enhance reproducibility. What other best pract
-- Herv?? Pag??s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20140430/7f599695/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20140430/7399c7b0/attachment.pl>