-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Prof
Brian Ripley
Sent: Tuesday, October 11, 2005 4:05 AM
To: Alexander Ploner
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Q: Suggestions for long-term data/program
storage policy?
On Tue, 11 Oct 2005, Alexander Ploner wrote:
we are a statistical/epidemiological departement that - after a few
years of rapid growth - finally is getting around to formulate a
general data storage and retention policy - mainly to ensure that we
can reproduce results from published papers/theses easier in the
future, but also with the hope that we get more synergy between
related projects.
We have formulated what we feel is a reasonable draft, requiring
basically that the raw data, all programs to create derived data
sets, and the analysis programs are stored and documented in a
uniform manner, regardless of the analysis software used.
data retention we are aiming for is 10 years, and the format for the
raw data is quite sane (either flat ASCII or real
You are intending to retain copies of the OS used and hardware too?
The results depend far more on those than you apparently realize.
Given the rapid devlopment cycle of R,
I think you will find your OS changes as fast: all those
security updates
potentially affect your results.
this suggests that at the very least all non-base packages
analysis are stored together with each project. I have
questions:
1) Are old R versions (binaries/sources) going to be available on
CRAN indefinitely?
Not binaries. The intention is that source files be
available, but they
could become corrupted (as it seems the Windows binary has for a past
version).
2) Is .RData a reasonable file format for long term storage?
I would say not, as it is almost impossible to recover from
any corruption
in such a file. We like to have long-term data in a human-readable
printout, with a print copy, and also store some checksums.
I would also be very grateful for any other suggestions, comments or
links for setting up and implementing such a storage policy (R-
specific or otherwise).
You need to consider the medium on which you are going to store the
archive. We currrently use CD-R (and not tapes as those are less
compatible across drives -- we have two identical drives
currently but do
not expect either to last 10 years), and check them annually
-- I guess we
will re-write to another medium after much less than 10 years.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595