R CMD build --resave-data
Hi Uwe,
On 11-04-11 08:13 AM, Uwe Ligges wrote:
On 11.04.2011 02:47, Herv? Pag?s wrote:
Hi, More about the new --resave-data option As mentioned previously here https://stat.ethz.ch/pipermail/r-devel/2011-April/060511.html 'R CMD build' and 'R CMD INSTALL' handle this new option inconsistently. The former does --resave-data="gzip" by default. The latter doesn't seem to support the --resave-data= syntax: the --resave-data flag must either be present or not. And by default 'R CMD INSTALL' won't resave the data. Also, because now 'R CMD build' is resaving the data, shouldn't it reinstall the package in order to be able to do this correctly? Here is why. There is this new warning in 'R CMD check' that complains about files not of a type allowed in a 'data' directory: http://bioconductor.org/checkResults/2.8/bioc-LATEST/Icens/lamb1-checksrc.html The Icens package also has .R files under data/ with things like: bet <- matrix(scan("CMVdata", quiet=TRUE),nc=5,byr=TRUE) i.e. the R code needs to access some of the text files located in the data/ folder. So in order to get rid of this warning I tried to move those text files to inst/extdata/ and I modified the code in the .R file so it does: CMVdata_filepath <- system.file("extdata", "CMVdata", package="Icens") bet <- matrix(scan(CMVdata_filepath, quiet=TRUE),nc=5,byr=TRUE) But now 'R CMD build' fails to resave the data because the package was not installed first and the CMVdata file could not be found. Unfortunately, for a lot of people that means that the safe way to build a source tarball now is with R CMD build --keep-empty-dirs --no-resave-data
Herv?, actually is makes some sense to have these defaults from a CRAN maintainer's point of view: --keep-empty-dirs: we found many packages containing empty dirs unnecessarily and the idea is to exclude them at the build state rather than at the later installation stage. Note that the package maintainer is supposed to run build (and knows if the empty dirs are to be included, the user who runs INSTALL does not). --no-resave-data: We found many packages with unsufficiently compressed data. This should be fixed when building the package, not later when installing it, since the reduces size is useful in the source tarball already. So it does make some sense to have different defaults in build as opposed to INSTALL from my point of view (although I could live with different, tough).
If you deliberately ignore the fact that 'R CMD INSTALL' is also used by developers to install from the *package source tree* (by opposition to end users who use it to install from a *source tarball*, even though they don't use it directly), then you have a point. So maybe I should have been more explicit about the problem that it can be for the *developer* to have 'R CMD build' and 'R CMD INSTALL' behave differently by default. Of course I'm not suggesting that 'R CMD INSTALL' should behave differently (by default) depending on whether it's used on a source tarball (mode 1) or a package source tree (mode 2). I'm suggesting that, by default, the 3 commands (R CMD build + R CMD INSTALL in mode 1 and 2) behave consistently. With the latest changes, and by default, 'R CMD INSTALL' is still doing the right thing, but not 'R CMD build' anymore. I perfectly understand the intention behind those new flags, which is to try to "optimize" the resulting source tarball but what would you think if 'gcc' had some optimization flags that can generate broken executables (under some circumstances) and if these flags were enabled by default? Note that I would have no problem with 'R CMD build' trying to resave the data by default if the current implementation of that feature was working properly, but unfortunately it's broken (see my previous email for the details). Thanks, H.
If you need further arguments for the discussion: I also tend to use --no-vignettes nowadays if my code does not change considerably. ;-) Best wishes, Uwe
I hope the list of options/flags that we need to use to "fix" 'R CMD build' (and make it consistent with R CMD INSTALL) is not going to grow too much ;-) Thanks, H.
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319