[RFC] A case for freezing CRAN

Jari Oksanen <jari.oksanen at oulu.fi> writes:
On 21/03/2014, at 10:40 AM, Rainer M Krug wrote:

This is a long and (mainly) interesting discussion, which is fanning out
in many different directions, and I think many are not that relevant to
the OP's suggestion. 

I see the advantages of having such a dynamic CRAN, but also of having a
more stable CRAN. I prefer CRAN as it is now, but ion many cases a more
stable CRAN might b an advantage. So having releases of CRAN might make
sense. But then there is the archiving issue of CRAN.

The suggestion was made to move the responsibility away from CRAN and
the R infrastructure to the user / researcher to guarantee that the
results can be re-run years later. It would be nice to have this build
in CRAN, but let's stick at the scenario that the user should care for
reproducability.
There are two different problems that alternate in the discussion:
reproducibility and breakage of CRAN dependencies. Frozen CRAN could
make *approximate* reproducibility easier to achieve, but real
reproducibility needs stricter solutions. Actual sessionInfo() is
minimal information, but re-building a spitting image of old
environment may still be demanding (but in many cases this does not
matter).

Another problem is that CRAN is so volatile that new versions of
packages break other packages or old scripts. Here the main problem is
how package developers work. Freezing CRAN would not change that: if
package maintainers release breaking code, that would be frozen. I
think that most packages do not make distinction between development
and release branches, and CRAN policy won't change that.

I can sympathize with package maintainers having 150 reverse
dependencies. My main package only has ~50, and it is sure that I
won't test them all with new release. I sometimes tried, but I could
not even get all those built because they had other dependencies on
packages that failed. Even those that I could test failed to detect
problems (in one case all examples were \dontrun and passed nicely
tests). I only wish that if people *really* depend on my package, they
test it against R-Forge version and alert me before CRAN releases, but
that is not very likely (I guess many dependencies are not *really*
necessary, but only concern marginal features of the package, but CRAN
forces to declare those).
Breakage of CRAN packages is a problem, to which I can not comment
much. I have no idea how this could be saved unless one introduces more
checks, which nobody wants. CRAN is a (more or less) open repository for
packages written by engineers / programmers but also scientists of other
fields - and that is the strength of CRAN - a central repository to find
packages which conform to a minimal standard and format.
Still a few words about reproducibility of scripts: this can be hardly
achieved with good coverage, because many scripts are so very ad
hoc. When I edit and review manuscripts for journals, I very often get
Sweave or knitr scripts that "just work", where "just" means "just so
and so". Often they do not work at all, because they had some
undeclared private functionalities or stray files in the author
workspace that did not travel with the Sweave document. 
One reason why I *always* start my R sessions --vanilla and ave a local
initialization script which I call manually.
I think these
-- published scientific papers -- are the main field where the code
really should be reproducible, but they often are the hardest to
reproduce. 
And this is completely ouyt of the hands of R / CRAN / ... and in the
hand of Journals and Authors. But R could provide a framework to make
this more easy in form of a package which provides functions to make
this a one-command approach.
Nothing CRAN people do can help with sloppy code scientists
write for publications. You know, they are scientists -- not
engineers.
Absolutely - and I am also a sloppy scientists - I put my code online,
but hope that not many people ask me later about it.

Cheers,

Rainer
Cheers, Jari Oksanen
Leaving the issue of compilation out, a package which is creating a
custom installation of the R version which includes the source of the R
version used and the sources of the packages in a on Linux compilable
format, given that the relevant dependencies are installed, would be a
huge step forward. 

I know - compilation on Windows (and sometimes Mac) is a serious
problem), but to archive *all* binaries and to re-compile all older
versions of R and all packages would be an impossible task.

Apart from that - doing your analysis in a Virtual Machine and then
simply archiving this Virtual Machine, would also be an option, but only
for the more tech savy users.

In a nutshell: I think a package would be able to provide the solution
for a local archiving to make it possible to re-run the simulation with
the same tools at a later stage - although guarantees would not be
possible.

Cheers,

Rainer
-- 
Rainer M. Krug
email: Rainer<at>krugs<dot>de
PGP: 0x0F52F982

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Rainer M. Krug
email: Rainer<at>krugs<dot>de
PGP: 0x0F52F982
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 494 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20140321/dc83421f/attachment.bin>

[RFC] A case for freezing CRAN

Thread (63 messages)