[RFC] A case for freezing CRAN

----- Original Message -----
From: "David Winsemius" <dwinsemius at comcast.net>
To: "Jeroen Ooms" <jeroen.ooms at stat.ucla.edu>
Cc: "r-devel" <r-devel at r-project.org>
Sent: Wednesday, March 19, 2014 11:03:32 PM
Subject: Re: [Rd] [RFC] A case for freezing CRAN

On Mar 19, 2014, at 7:45 PM, Jeroen Ooms wrote:

On Wed, Mar 19, 2014 at 6:55 PM, Michael Weylandt
<michael.weylandt at gmail.com> wrote:
Reading this thread again, is it a fair summary of your position
to say "reproducibility by default is more important than giving
users access to the newest bug fixes and features by default?"
It's certainly arguable, but I'm not sure I'm convinced: I'd
imagine that the ratio of new work being done vs reproductions is
rather high and the current setup optimizes for that already.
I think that separating development from released branches can give
us
both reliability/reproducibility (stable branch) as well as new
features (unstable branch). The user gets to pick (and you can pick
both!). The same is true for r-base: when using a 'released'
version
you get 'stable' base packages that are up to 12 months old. If you
want to have the latest stuff you download a nightly build of
r-devel.
For regular users and reproducible research it is recommended to
use
the stable branch. However if you are a developer (e.g. package
author) you might want to develop/test/check your work with the
latest
r-devel.

I think that extending the R release cycle to CRAN would result
both
in more stable released versions of R, as well as more freedom for
package authors to implement rigorous change in the unstable
branch.
When writing a script that is part of a production pipeline, or
sweave
paper that should be reproducible 10 years from now, or a book on
using R, you use stable version of R, which is guaranteed to behave
the same over time. However when developing packages that should be
compatible with the upcoming release of R, you use r-devel which
has
the latest versions of other CRAN and base packages.

As I remember ... The example demonstrating the need for this was an
XML package that cause an extract from a website where the headers
were misinterpreted as data in one version of pkg:XML and not in
another. That seems fairly unconvincing. Data cleaning and
validation is a basic task of data analysis. It also seems excessive
to assert that it is the responsibility of CRAN to maintain a synced
binary archive that will be available in ten years. 
CRAN already does this, the bin/windows/contrib directory has subdirectories going back to 1.7, with packages dated October 2004. I don't see why it is burdensome to continue to archive these. It would be nice if source versions had a similar archive.

Dan
Bug fixes would
be inhibited for years.... not unlike SAS and Excel. What next?
Perhaps al bugs should be labeled as features?  Surely this
CRAN-of-the-future would be offering something that no other
statistical package currently offers, nicht wahr?

Why not leave it to the authors to specify the packages which version
numbers were used in their publications. The authors of the packages
would get recognition and the dependencies would be recorded.

--
David.

What I'm trying to figure out is why the standard "install the
following list of package versions" isn't good enough in your
eyes?
Almost nobody does this because it is cumbersome and impractical.
We
can do so much better than this. Note that in order to install old
packages you also need to investigate which versions of
dependencies
of those packages were used. On win/osx, users need to manually
build
those packages which can be a pain. All in all it makes
reproducible
research difficult and expensive and error prone. At the end of the
day most published results obtain with R just won't be
reproducible.

Also I believe that keeping it simple is essential for solutions to
be
practical. If every script has to be run inside an environment with
custom libraries, it takes away much of its power. Running a bash
or
python script in Linux is so easy and reliable that entire
distributions are based on it. I don't understand why we make our
lives so difficult in R.

In my estimation, a system where stable versions of R pull packages
from a stable branch of CRAN will naturally resolve the majority of
the reproducibility and reliability problems with R. And in
contrast
to what some people here are suggesting it does not introduce any
limitations. If you want to get the latest stuff, you either grab a
copy of r-devel, or just enable the testing branch and off you go.
Debian 'testing' works in a similar way, see
http://www.debian.org/devel/testing.

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
David Winsemius
Alameda, CA, USA

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[RFC] A case for freezing CRAN

Thread (63 messages)