Skip to content

Strategies for keeping autogenerated .Rd files out of a Git tree

13 messages · Gábor Csárdi, Kirill Müller, Romain Francois +3 more

#
Hi

Quite a few R packages are now available on GitHub long before they 
appear on CRAN, installation is simple thanks to 
devtools::install_github(). However, it seems to be common practice to 
keep the .Rd files (and NAMESPACE and the Collate section in the 
DESCRIPTION) in the Git tree, and to manually update it, even if they 
are autogenerated from the R code by roxygen2. This requires extra work 
for each update of the documentation and also binds package development 
to a specific version of roxygen2 (because otherwise lots of bogus 
changes can be added by roxygenizing with a different version).

What options are there to generate the .Rd files during build/install? 
In https://github.com/hadley/devtools/issues/43 the issue has been 
discussed, perhaps it can be summarized as follows:

- The devtools package is not the right place to implement 
roxygenize-before-build
- A continuous integration service would be better for that, but 
currently there's nothing that would be easy to use
- Roxygenizing via src/Makefile could work but requires further 
investigation and an installation of Rtools/xcode on Windows/OS X

Especially the last point looks interesting to me, but since this is not 
widely used there must be pitfalls I'm not aware of. The general idea 
would be:

- Place code that builds/updates the .Rd and NAMESPACE files into 
src/Makefile
- Users installing the package from source will require infrastructure 
(Rtools/make)
- For binary packages, the .Rd files are already generated and added to 
the .tar.gz during R CMD build before they are submitted to 
CRAN/WinBuilder, and they are also generated (in theory) by R CMD build 
--binary

I'd like to hear your opinion on that. I have also found a thread on 
package development workflow 
(https://stat.ethz.ch/pipermail/r-devel/2011-September/061955.html) but 
there was nothing on un-versioning .Rd files.


Cheers

Kirill
#
Hi,

this is maybe mostly a personal preference, but I prefer not to put
generated files in the vc repository. Changes in the generated files,
especially if there is many of them, pollute the diffs and make them
less useful.

If you really want to be able to install the package directly from
github, one solution is to
1. create another repository, that contains the complete generated
package, so that install_github() can install it.
2. set up a CI service, that can download the package from github,
build the package or the generated files (check the package, while it
is at it), and then push the build stuff back to github.
3. set up a hook on github, that invokes the CI after each commit.

I have used this setup in various projects with jenkins-ci and it
works well. Diffs are clean, the package is checked and built
frequently, and people can download it without having to install the
tools that generate the generated files.

The only downside is that you need to install a CI, so you need a
"server" for that. Maybe you can do this with travis-ci, maybe not, I
am not familiar with it that much.

Best,
Gabor

On Wed, Dec 11, 2013 at 7:39 PM, Kirill M?ller
<kirill.mueller at ivt.baug.ethz.ch> wrote:
1 day later
#
Gabor

I agree with you. There's Travis CI, and r-travis -- an attempt to 
integrate R package testing with Travis. Pushing back to GitHub is 
possible, but the setup is somewhat difficult. Also, this can be subject 
to race conditions because each push triggers a test run and they can 
happen in parallel even for the same repository. How do you handle branches?

It would be really great to be able to execute custom R code before 
building. Perhaps in a PreBuild: section in DESCRIPTION?


Cheers

Kirill
On 12/12/2013 02:21 AM, G?bor Cs?rdi wrote:

  
    
#
On 12/11/2013 4:39 PM, Kirill M?ller wrote:
One downside I can see with this third approach is that by making the 
package documentation generation part of the build process, you must 
then make the package depend/require roxygen (or whatever tools you are 
using to generate documentation). This dependence, though, is just to 
build the package, not to actually use the package. And by pushing this 
dependency onto the end users of the package, you have transferred the 
problem you mentioned ("... and also binds package development to a 
specific version of roxygen2 ...") to the many end users rather than the 
few developers.

  
    
#
On Fri, Dec 13, 2013 at 6:03 AM, Kirill M?ller
<kirill.mueller at ivt.baug.ethz.ch> wrote:
I set my CI, so that it does not allow concurrent builds from the same
job. So there are no race conditions. This is probably possible with
Travis, I don't know.
So far I didn't, and only pushed back "main" branch. But you can just
push back to different branches. In this case I would probably create
another repo, and have the same branches in both in the "source" repo
and the "publish" repo.
I am just using make to create the package. This creates all
autogenerated files and then calls R PKG build.

Another option for this whole problem is not using github at all, but
setting up a CRAN-like repository, and make the CI publish the built
and checked packages there.

Gabor
#
FWIW this is essentially what RForge.net provides. Each GitHub commit triggers a build (branches are supported as the branch info is passed in the WebHook) which can be either "classic" R CMD build or a custom shell script (hence you can do anything you want). The result is a tar ball (which includes the generated files) and that tar ball gets published in the R package repository. R CMD  check is run as well on the tar ball and the results are published.
This way you don't need devtools, users can simply use install.packages() without requiring any additional tools.

There are some talks about providing the above as a cloud service, so that anyone can run and/or use it.

Cheers,
Simon
On Dec 13, 2013, at 8:51 AM, Kirill M?ller <kirill.mueller at ivt.baug.ethz.ch> wrote:

            
#
Oh, I didn't know RForge.net supported external git repos, cool!

Gabor

On Fri, Dec 13, 2013 at 3:14 PM, Simon Urbanek
<simon.urbanek at r-project.org> wrote:
[...]
#
Btw. one thing that probably would not work (well) with RForge.net (or
another CRAN-like repo), is the multiple branches.

The problem is that you cannot put the branch name in the package
version string, because that is not allowed, and then the versions
from the multiple branches get mixed up. This works fine with
install_github() because you can explicitly specify the branch there.

One possible solution is to create multiple repos, one for each
branch. Not really elegant, though.

I don't really need this myself, I am just saying because it came up
in this thread.

Gabor
On Fri, Dec 13, 2013 at 3:24 PM, G?bor Cs?rdi <csardi.gabor at gmail.com> wrote:
#
Thanks a lot. This would indeed solve the problem. I'll try mkdist today ;-)

Is the NEWS file parsed before of after mkdist has been executed?

Would you be willing to share the code for the infrastructure, perhaps 
on GitHub?


-Kirill
On 12/13/2013 09:14 PM, Simon Urbanek wrote:

  
    
#
On 12/13/2013 06:09 PM, Brian Diggs wrote:
That's right. As outlined in another message, roxygen2 would be required 
for building from the "raw" source (hosted on GitHub) but not for 
installing from a source tarball (which would contain the .Rd files). 
Not sure if that's possible, though.