Skip to content

R CMD build --resave-data

10 messages · Uwe Ligges, Simon Urbanek, Hadley Wickham +2 more

#
Hi,

More about the new --resave-data option

As mentioned previously here

   https://stat.ethz.ch/pipermail/r-devel/2011-April/060511.html

'R CMD build' and 'R CMD INSTALL' handle this new option
inconsistently. The former does --resave-data="gzip" by default.
The latter doesn't seem to support the --resave-data= syntax:
the --resave-data flag must either be present or not. And by
default 'R CMD INSTALL' won't resave the data.

Also, because now 'R CMD build' is resaving the data, shouldn't it
reinstall the package in order to be able to do this correctly?

Here is why. There is this new warning in 'R CMD check' that complains
about files not of a type allowed in a 'data' directory:

 
http://bioconductor.org/checkResults/2.8/bioc-LATEST/Icens/lamb1-checksrc.html

The Icens package also has .R files under data/ with things like:

   bet <- matrix(scan("CMVdata", quiet=TRUE),nc=5,byr=TRUE)

i.e. the R code needs to access some of the text files located
in the data/ folder. So in order to get rid of this warning I
tried to move those text files to inst/extdata/ and I modified
the code in the .R file so it does:

   CMVdata_filepath <- system.file("extdata", "CMVdata", package="Icens")
   bet <- matrix(scan(CMVdata_filepath, quiet=TRUE),nc=5,byr=TRUE)

But now 'R CMD build' fails to resave the data because the package
was not installed first and the CMVdata file could not be found.

Unfortunately, for a lot of people that means that the safe way to
build a source tarball now is with

   R CMD build --keep-empty-dirs --no-resave-data

I hope the list of options/flags that we need to use to "fix" 'R CMD
build' (and make it consistent with R CMD INSTALL) is not going to
grow too much ;-)

Thanks,
H.
#
On 11.04.2011 02:47, Herv? Pag?s wrote:
Herv?,

actually is makes some sense to have these defaults from a CRAN 
maintainer's point of view:

--keep-empty-dirs:
we found many packages containing empty dirs unnecessarily and the idea 
is to exclude them at the build state rather than at the later 
installation stage. Note that the package maintainer is supposed to run 
build (and knows if the empty dirs are to be included, the user who runs 
INSTALL does not).

--no-resave-data:
We found many packages with unsufficiently compressed data. This should 
be fixed when building the package, not later when installing it, since 
the reduces size is useful in the source tarball already.

So it does make some sense to have different defaults in build as 
opposed to INSTALL from my point of view (although I could live with 
different, tough).

If you need further arguments for the discussion: I also tend to use 
--no-vignettes nowadays if my code does not change considerably. ;-)

Best wishes,
Uwe
1 day later
#
Hi Uwe,
On 11-04-11 08:13 AM, Uwe Ligges wrote:
If you deliberately ignore the fact that 'R CMD INSTALL' is also used
by developers to install from the *package source tree* (by opposition
to end users who use it to install from a *source tarball*, even though
they don't use it directly), then you have a point. So maybe I should
have been more explicit about the problem that it can be for the
*developer* to have 'R CMD build' and 'R CMD INSTALL' behave
differently by default.

Of course I'm not suggesting that 'R CMD INSTALL' should behave
differently (by default) depending on whether it's used on a source
tarball (mode 1) or a package source tree (mode 2).

I'm suggesting that, by default, the 3 commands (R CMD build +
R CMD INSTALL in mode 1 and 2) behave consistently.

With the latest changes, and by default, 'R CMD INSTALL' is still doing
the right thing, but not 'R CMD build' anymore.

I perfectly understand the intention behind those new flags, which is
to try to "optimize" the resulting source tarball but what would you
think if 'gcc' had some optimization flags that can generate broken
executables (under some circumstances) and if these flags were enabled
by default?

Note that I would have no problem with 'R CMD build' trying to resave
the data by default if the current implementation of that feature
was working properly, but unfortunately it's broken (see my previous
email for the details).

Thanks,
H.

  
    
#
On Apr 12, 2011, at 8:53 PM, Herv? Pag?s wrote:

            
.. for a good reason, IMHO no serious developer would do that for obvious reasons - you'd be working on a dirty copy creating many unnecessary problems and polluting your sources. The first time you'll spend an hour chasing a non-existent problem due to stale binary objects in your tree you'll learn that lesson ;). The fraction of a second spent in R CMD build is well worth the hours saved. IMHO the only valid reason to run INSTALL on a (freshly unpacked tar ball) directory is to capture config.log.

Cheers,
Simon
#
This is news to me!  I know that you're supposed to run R CMD check on
the built package, but you're supposed to run install on it too?  (And
if it's so important, why doesn't R do it for you automatically?)

Do you have any convenient shortcuts to overcome the fact that the
binary package contains the package name?  i.e. how can I build and
install/check in a single line without having to specify the full file
name?

How can I go from:

R CMD build plyr && R CMD install plyr_1.5.tar.gz

to

R CMD build-and-install plyr ?

Hadley
#
On Apr 12, 2011, at 10:26 PM, Hadley Wickham wrote:

            
I'm not saying "supposed to" I'm saying wise to. And the "IMHO"s above were what I really meant. By all means, you're free to do anything as long as you don't ask on the mailing list that something doesn't work because you ran it on a stale directory ;).
Some will argue that's an invalid command to start with ;). But other than that I see nothing wrong with it ... it's what I do to be honest ... (except where I don't, but that has to do with my custom build script legacy which has a defined way to get versions on the shell, long story...).
R CMD build plyr && R CMD INSTALL plyr_*

... if you don't keep too many version in the same directory ;) - but it's not something I would use. For the paranoid

R CMD build plyr && R CMD INSTALL plyr_`sed -n 's/Version: *//p' plyr/DESCRIPTION`.tar.gz

But, seriously, that is the least problem I see - you'd have to advance your version numbers very quickly to get the version command out of your shell history...


Cheers,
Simon
#
On 11-04-12 07:06 PM, Simon Urbanek wrote:
This sounds like saying that no serious developer working on a big
project involving a lot of files to compile should use 'make'.
I mean, serious developers like you *always* do 'make clean' before
they do 'make' on the R tree when they need to test a change, even
a small one? And this only takes a "fraction of second" for them?
Hey, I'd love to be able to do that too! ;-)

H.

  
    
#

        
> On 11-04-12 07:06 PM, Simon Urbanek wrote:
>>
>> On Apr 12, 2011, at 8:53 PM, Herv? Pag?s wrote:
>> 
    >>> Hi Uwe,
    >>>
>>> On 11-04-11 08:13 AM, Uwe Ligges wrote:
>>>> 
    >>>>
>>>> On 11.04.2011 02:47, Herv? Pag?s wrote:
>>>>> Hi,
    >>>>> 
    >>>>> More about the new --resave-data option
    >>>>> 
    >>>>> As mentioned previously here
    >>>>> 
    >>>>> https://stat.ethz.ch/pipermail/r-devel/2011-April/060511.html
    >>>>> 
    >>>>> 'R CMD build' and 'R CMD INSTALL' handle this new option
    >>>>> inconsistently. The former does --resave-data="gzip" by
    >>>>> default.  The latter doesn't seem to support the
    >>>>> --resave-data= syntax: the --resave-data flag must either be
    >>>>> present or not. And by default 'R CMD INSTALL' won't resave
    >>>>> the data.
    >>>>> 
    >>>>> Also, because now 'R CMD build' is resaving the data,
    >>>>> shouldn't it reinstall the package in order to be able to do
    >>>>> this correctly?
    >>>>> 
    >>>>> Here is why. There is this new warning in 'R CMD check' that
    >>>>> complains about files not of a type allowed in a 'data'
    >>>>> directory:
    >>>>> 
    >>>>> 
    >>>>> http://bioconductor.org/checkResults/2.8/bioc-LATEST/Icens/lamb1-checksrc.html
    >>>>> 
    >>>>> 
    >>>>> 
    >>>>> The Icens package also has .R files under data/ with things
    >>>>> like:
    >>>>> 
    >>>>> bet<- matrix(scan("CMVdata", quiet=TRUE),nc=5,byr=TRUE)
    >>>>> 
    >>>>> i.e. the R code needs to access some of the text files
    >>>>> located in the data/ folder. So in order to get rid of this
    >>>>> warning I tried to move those text files to inst/extdata/
    >>>>> and I modified the code in the .R file so it does:
    >>>>> 
    >>>>> CMVdata_filepath<- system.file("extdata", "CMVdata",
    >>>>> package="Icens") bet<- matrix(scan(CMVdata_filepath,
    >>>>> quiet=TRUE),nc=5,byr=TRUE)
    >>>>> 
    >>>>> But now 'R CMD build' fails to resave the data because the
    >>>>> package was not installed first and the CMVdata file could
    >>>>> not be found.
    >>>>> 
    >>>>> Unfortunately, for a lot of people that means that the safe
    >>>>> way to build a source tarball now is with
    >>>>> 
    >>>>> R CMD build --keep-empty-dirs --no-resave-data
    >>>> 
    >>>> 
    >>>> Herv?,
    >>>> 
    >>>> actually is makes some sense to have these defaults from a
    >>>> CRAN maintainer's point of view:
    >>>> 
    >>>> --keep-empty-dirs: we found many packages containing empty
    >>>> dirs unnecessarily and the idea is to exclude them at the
    >>>> build state rather than at the later installation stage. Note
    >>>> that the package maintainer is supposed to run build (and
    >>>> knows if the empty dirs are to be included, the user who runs
    >>>> INSTALL does not).
    >>>> 
    >>>> --no-resave-data: We found many packages with unsufficiently
    >>>> compressed data. This should be fixed when building the
    >>>> package, not later when installing it, since the reduces size
    >>>> is useful in the source tarball already.
    >>>> 
    >>>> So it does make some sense to have different defaults in
    >>>> build as opposed to INSTALL from my point of view (although I
    >>>> could live with different, tough).
    >>> 
    >>> If you deliberately ignore the fact that 'R CMD INSTALL' is
    >>> also used by developers to install from the *package source
    >>> tree* (by opposition to end users who use it to install from a
    >>> *source tarball*,
    >> 
    >> .. for a good reason, IMHO no serious developer would do that
    >> for obvious reasons -

    > This sounds like saying that no serious developer working on a
    > big project involving a lot of files to compile should use
    > 'make'.  I mean, serious developers like you *always* do 'make
    > clean' before they do 'make' on the R tree when they need to
    > test a change, even a small one? And this only takes a "fraction
    > of second" for them?  Hey, I'd love to be able to do that too!
    > ;-)

    > H.

    >> you'd be working on a dirty copy creating many unnecessary
    >> problems and polluting your sources. The first time you'll
    >> spend an hour chasing a non-existent problem due to stale
    >> binary objects in your tree you'll learn that lesson ;). The
    >> fraction of a second spent in R CMD build is well worth the
    >> hours saved. IMHO the only valid reason to run INSTALL on a
    >> (freshly unpacked tar ball) directory is to capture config.log.
    >> 
    >> Cheers, Simon
    >> 
    >> 
    >> 
    >>> even though they don't use it directly), then you have a
    >>> point. So maybe I should have been more explicit about the
    >>> problem that it can be for the *developer* to have 'R CMD
    >>> build' and 'R CMD INSTALL' behave differently by default.
    >>> 
    >>> Of course I'm not suggesting that 'R CMD INSTALL' should
    >>> behave differently (by default) depending on whether it's used
    >>> on a source tarball (mode 1) or a package source tree (mode
    >>> 2).
    >>> 
    >>> I'm suggesting that, by default, the 3 commands (R CMD build +
    >>> R CMD INSTALL in mode 1 and 2) behave consistently.
    >>> 
    >>> With the latest changes, and by default, 'R CMD INSTALL' is
    >>> still doing the right thing, but not 'R CMD build' anymore.
    >>> 
    >>> I perfectly understand the intention behind those new flags,
    >>> which is to try to "optimize" the resulting source tarball but
    >>> what would you think if 'gcc' had some optimization flags that
    >>> can generate broken executables (under some circumstances) and
    >>> if these flags were enabled by default?
    >>> 
    >>> Note that I would have no problem with 'R CMD build' trying to
    >>> resave the data by default if the current implementation of
    >>> that feature was working properly, but unfortunately it's
    >>> broken (see my previous email for the details).
    >>> 
    >>> Thanks, H.
    >>> 
    >>>> 
    >>>> If you need further arguments for the discussion: I also tend to use
    >>>> --no-vignettes nowadays if my code does not change considerably. ;-)
    >>>> 
    >>>> Best wishes,
    >>>> Uwe
    >>>> 
    >>>> 
    >>>> 
    >>>>> I hope the list of options/flags that we need to use to "fix" 'R CMD
    >>>>> build' (and make it consistent with R CMD INSTALL) is not going to
    >>>>> grow too much ;-)
;-)

I'm with Herve here.
I almost always use  R CMD INSTALL on a directory rather than a
tarball... though most of the time the directory is freshly
untarred.
Other times, however one of the reasons is exactly that I can
keep things around (*.o, ...) which are only rebuilt very
rarely.

Martin
#
On 13.04.2011 02:53, Herv? Pag?s wrote:
It is one thing to talk about sensible defaults and another thing to 
talk about bugs. I just talked about sensible defaults. And I have not 
had the time to look iunto details. I just arrived in Dortmund 15 
minutes ago and I the first thing I have to do is repairing some 
winbuilder stuff and get 2.13.0 ready on it. I may look into other 
details later this week or at the beginning of next week.

Uwe
#
Hi Uwe,
On 11-04-13 10:50 AM, Uwe Ligges wrote:
No problem. I understand perfectly. Release times are very busy time
on the Bioconductor side too. Thanks for looking into this!

H.