[RFC] A case for freezing CRAN
Jari Oksanen <jari.oksanen at oulu.fi> writes:
On 21/03/2014, at 10:40 AM, Rainer M Krug wrote:
This is a long and (mainly) interesting discussion, which is fanning out in many different directions, and I think many are not that relevant to the OP's suggestion. I see the advantages of having such a dynamic CRAN, but also of having a more stable CRAN. I prefer CRAN as it is now, but ion many cases a more stable CRAN might b an advantage. So having releases of CRAN might make sense. But then there is the archiving issue of CRAN. The suggestion was made to move the responsibility away from CRAN and the R infrastructure to the user / researcher to guarantee that the results can be re-run years later. It would be nice to have this build in CRAN, but let's stick at the scenario that the user should care for reproducability.
There are two different problems that alternate in the discussion: reproducibility and breakage of CRAN dependencies. Frozen CRAN could make *approximate* reproducibility easier to achieve, but real reproducibility needs stricter solutions. Actual sessionInfo() is minimal information, but re-building a spitting image of old environment may still be demanding (but in many cases this does not matter). Another problem is that CRAN is so volatile that new versions of packages break other packages or old scripts. Here the main problem is how package developers work. Freezing CRAN would not change that: if package maintainers release breaking code, that would be frozen. I think that most packages do not make distinction between development and release branches, and CRAN policy won't change that. I can sympathize with package maintainers having 150 reverse dependencies. My main package only has ~50, and it is sure that I won't test them all with new release. I sometimes tried, but I could not even get all those built because they had other dependencies on packages that failed. Even those that I could test failed to detect problems (in one case all examples were \dontrun and passed nicely tests). I only wish that if people *really* depend on my package, they test it against R-Forge version and alert me before CRAN releases, but that is not very likely (I guess many dependencies are not *really* necessary, but only concern marginal features of the package, but CRAN forces to declare those).
Breakage of CRAN packages is a problem, to which I can not comment much. I have no idea how this could be saved unless one introduces more checks, which nobody wants. CRAN is a (more or less) open repository for packages written by engineers / programmers but also scientists of other fields - and that is the strength of CRAN - a central repository to find packages which conform to a minimal standard and format.
Still a few words about reproducibility of scripts: this can be hardly achieved with good coverage, because many scripts are so very ad hoc. When I edit and review manuscripts for journals, I very often get Sweave or knitr scripts that "just work", where "just" means "just so and so". Often they do not work at all, because they had some undeclared private functionalities or stray files in the author workspace that did not travel with the Sweave document.
One reason why I *always* start my R sessions --vanilla and ave a local initialization script which I call manually.
I think these -- published scientific papers -- are the main field where the code really should be reproducible, but they often are the hardest to reproduce.
And this is completely ouyt of the hands of R / CRAN / ... and in the hand of Journals and Authors. But R could provide a framework to make this more easy in form of a package which provides functions to make this a one-command approach.
Nothing CRAN people do can help with sloppy code scientists write for publications. You know, they are scientists -- not engineers.
Absolutely - and I am also a sloppy scientists - I put my code online, but hope that not many people ask me later about it. Cheers, Rainer
Cheers, Jari Oksanen
Leaving the issue of compilation out, a package which is creating a custom installation of the R version which includes the source of the R version used and the sources of the packages in a on Linux compilable format, given that the relevant dependencies are installed, would be a huge step forward. I know - compilation on Windows (and sometimes Mac) is a serious problem), but to archive *all* binaries and to re-compile all older versions of R and all packages would be an impossible task. Apart from that - doing your analysis in a Virtual Machine and then simply archiving this Virtual Machine, would also be an option, but only for the more tech savy users. In a nutshell: I think a package would be able to provide the solution for a local archiving to make it possible to re-run the simulation with the same tools at a later stage - although guarantees would not be possible. Cheers, Rainer -- Rainer M. Krug email: Rainer<at>krugs<dot>de PGP: 0x0F52F982
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Rainer M. Krug email: Rainer<at>krugs<dot>de PGP: 0x0F52F982 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 494 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20140321/dc83421f/attachment.bin>