Skip to content

Please explain your workflow from R code -> package -> R code -> package

16 messages · Paul Johnson, Spencer Graves, Dirk Eddelbuettel +9 more

#
Hi,

I'm asking another one of those questions that would be obvious if I
could watch your work while you do it.

I'm having trouble understanding the workflow of code and package maintenance.

Stage 1.  Make some R functions in a folder.  This is in a Subversion repo

R/trunk/myproject

Stage 2. Make a package:

After the package.skeleton, and R check, I have a new folder with the
project in it,

R/trunk/myproject/mypackage
  DESCRIPTION
  man
  R

I to into the man folder and manually edit the Rd files. I don't
change anything in the R folder because I think it is OK so far.

And eventually I end up with a tarball mypackage_1.0.tar.gz.

Stage 3. How to make the round trip? I add more R code, and
re-generate a package.

package.skeleton obliterates the help files I've already edited.

So keeping the R code in sync with the documentation appears to be a hassle.

In other languages, I've seen to write the documentation inside the
code files and then post-process to make the documentation.  Is there
a similar thing for R, to unify the R code development and
documentation/package-making process?

pj
#
On Fri, Sep 9, 2011 at 11:38 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:
See ?dump and ?prompt.
Yes.  See the roxygen / roxygen2 and devtools packages.
Best,
--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com
#
I write the *.Rd file including examples that I use as unit tests 
before I write the code.  Others criticize this approach insisting that 
\examples is NOT a place for unit tests.  There are unit testing 
protocols for R that I have not learned.  Instead, I rely on \dontshow 
inside \examples.


       There are two documents on CRAN -> "Documentation:  Contributed" 
entitled "Creating R Packages".  They might help.


       Hope this helps.
       Spencer
On 9/9/2011 9:49 AM, Joshua Ulrich wrote:
#
On 9 September 2011 at 11:38, Paul Johnson wrote:
| Hi,
| 
| I'm asking another one of those questions that would be obvious if I
| could watch your work while you do it.
| 
| I'm having trouble understanding the workflow of code and package maintenance.
| 
| Stage 1.  Make some R functions in a folder.  This is in a Subversion repo
| 
| R/trunk/myproject
| 
| Stage 2. Make a package:
| 
| After the package.skeleton, and R check, I have a new folder with the
| project in it,
| 
| R/trunk/myproject/mypackage
|   DESCRIPTION
|   man
|   R
| 
| I to into the man folder and manually edit the Rd files. I don't
| change anything in the R folder because I think it is OK so far.
| 
| And eventually I end up with a tarball mypackage_1.0.tar.gz.
| 
| Stage 3. How to make the round trip? I add more R code, and
| re-generate a package.
| 
| package.skeleton obliterates the help files I've already edited.

In my case a lot of 'C-x f' to open a new man/foo.Rd file, maybe a new
src/foo.cpp file, maybe a new unit tests file ...  It all depends.  I tend
then call little one-line shell commands which may rebuild the package and
run test expressions via littler.

But as a general rule, code and documentation (still) gets written (manually)
in an editor.

| So keeping the R code in sync with the documentation appears to be a hassle.

But 'R CMD check' is vigilant and reminds you when you get some details wrong.
 
| In other languages, I've seen to write the documentation inside the
| code files and then post-process to make the documentation.  Is there
| a similar thing for R, to unify the R code development and
| documentation/package-making process?

You can also follow the cool kids who these days tie some of this together
using roxygen.
 

And just as a philosophical note: There is (still) no silver bullet.  The
slow boring of thick boards applies not only to politics (as per Max Weber)
but quite conceivably to programming.

Hth, Dirk
 
| pj
| 
| -- 
| Paul E. Johnson
| Professor, Political Science
| 1541 Lilac Lane, Room 504
| University of Kansas
| 
| ______________________________________________
| R-devel at r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-devel
#
On 09/09/2011 12:38 PM, Paul Johnson wrote:
You should only run it once.  After that, add your code by editing *.R 
files in the R directory, sourcing them, and generate *.Rd files using 
prompt().  As Dirk said, run R CMD check when you think you're done, and 
it will point out how wrong you are.
If you write the *.Rd file before (like Spencer) or soon after writing 
the code, then design errors will usually stick out at you, and you can 
modify the functions.  If you keep your functions small, you'll get them 
working early, and won't have a lot of problems keeping them in sync 
with the docs, because they won't change much once you get them right.

Duncan Murdoch
#
On 9/9/2011 10:47 AM, Duncan Murdoch wrote:
For me, the benefits are huge:  I believe I tripled my software 
development productivity almost overnight when I started writing 
documentation with examples (unit tests) before writing the code.  Then 
I run "R CMD check" after every tiny change.  This may seem like extra 
work, but it saves debugging time, because any new problems are likely 
restricted to what I changed.  For example, I write a function A.  Then 
I write B.  Then I write C.  In the process of writing C, I change A.  R 
CMD check after adding C reveals that the change to A broke B.  Without 
the R package discipline, it could easily be a year before a found that 
a bug existed, and then it was an enormous effort to find and fix it.  
(See Wikipedia, "Software repository", "Package development process".)  
In addition to having better code is less time for myself, I can easily 
share the results with others -- thereby increasing my productivity 
substantially more than the factor of three I mentioned.


Spencer
#
It's not the cool kids who are doing this, it's the lazy kids ;)
Roxygen(2) does remove a considerable amount of replication between
code and documentation (e.g. replicating function usage in two
places), and the close proximity between code and documentation does
make it easier to remember to update your documentation when the code
changes.

Roxygen2 adds a few other tools for reducing duplication like
templates, the ability to inherit parameter documentation from other
function, and the family tag to automatically add seealso references
between all members of a related family of functions.  These are
things that are painful to do by hand and add a significance
maintenance burden.

I agree that there's no silver bullet, but good tools certainly can
make life easier.

Hadley
#
The eventual goal for the devtools package is to make this fast and
completely automated, so that while you are editing your package you
have another window open that detects any problems as soon as you make
them.  This isn't quite possible with R CMD check because it's so
thorough, which tends to make it rather slow.  But if you know what's
changed, you should be able to selectively figure out what pieces of
the check to run.

Hadley
#
+1 for roxygen2, lazycoolness oblige.

An alternative that has not been mentioned is inlinedocs,
http://inlinedocs.r-forge.r-project.org/
I don't use it myself, but it might appeal to your workflow.

baptiste
On 10 September 2011 06:41, Hadley Wickham <hadley at rice.edu> wrote:
#
On Fri, Sep 9, 2011 at 7:41 PM, Hadley Wickham <hadley at rice.edu> wrote:

            
laziness being one of the three virtues of a programmer. The other
two being hubris and something else I don't have time to look up at
the moment.

 library(fortunes) fodder: "Don't do as I say, do as Hadley does."
#
All I need now is a tool to go through the 4 packages I already
created without Roxygen and  spit out source files with the Roxygen
comments in them...

really lazy.
On Fri, Sep 9, 2011 at 11:41 AM, Hadley Wickham <hadley at rice.edu> wrote:
#
On Sat, Sep 10, 2011 at 11:23 AM, steven mosher <moshersteven at gmail.com> wrote:
That's what Rd2roxygen does...

Best,
--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com
#
Exactly. Rd2roxygen is proud to be a member of the "Lazy Ally", and
tries to make diligent developers lazier... Although it does not
guarantee a perfect transition from Rd to roxygen (be sure to check
out the documentation), it should be able to save you a considerable
amount of time.

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA
On Sat, Sep 10, 2011 at 11:31 AM, Joshua Ulrich <josh.m.ulrich at gmail.com> wrote:
#
Thanks, I was too lazy to even look for it.
On Sat, Sep 10, 2011 at 9:31 AM, Joshua Ulrich <josh.m.ulrich at gmail.com> wrote:
#
I create & maintain all my packages using the 'mvbutils' package. Documentation in plain-text format (not Rd) is stored along with each function definition--- so when you edit your function, its doco is right there too, and it looks like proper documentation, not code-comments or quasi-Latex. The entire package source tree, including the Rd files, is created automatically by the 'preinstall' function, after which you can then R-BUILD the package as usual. However, with 'mvbutils' you only need R-BUILD when you want a distribution version for others. Normal maintenance doesn't require R-BUILD; you can add/remove/edit functions, documentation, and data to the package on-the-fly while it is loaded, with no need to unload/uninstall/rebuild/reload.

It works with compiled code, too. My own way of working with compiled code is a bit different to most other people's, but colleagues who use more traditional routes have also successfully used 'mvbutils' to build and maintain their packages.

In the spirit of several other replies-- I spent months developing this stuff and getting it to run smoothly, precisely because I'm lazy and have a limited memory...

HTH (though whether "yet another approach is..." will actually help you, I'm not sure)

Mark


Mark Bravington
CSIRO CMIS
Marine Lab
Hobart
Australia
2 days later
#
Hi--

I guess I'm a bit late to the party. I enjoyed this thread immensely,
and it helped me discover roxygen2, in what I predict to be the
beginning of a beautiful friendship.

I wrote a couple of wrappers over the last few days which are making
my workflow easier. I code on a computer but run my computations on
several different boxes, meaning that I need a simple way (ideally
one-liner) to deploy the code on each box. So I have a couple of
wrapper functions, which I am happy to share with you.

1) RPush: this bash function roxygenizes, commits and pushes a package
to a git repo. It assumes that your packages are all in a folder, the
path to which is given by the environment variable $RPACK. For example
$RPACK/Package1 and $RPACK/Package2.)

function RPush {
cd $RPACK/$1
R --quiet --vanilla --slave  <<EOF
suppressMessages(library(roxygen2))
roxygenize("./")
EOF
git add ./
git commit -am "$2"
git push
cd - > /dev/null
}

The syntax is
RPush <package name> 'commit message'

2) GitInstall is an R function somewhat inspired by Hadley Wickham's
Intall_Github, except that it is not bound to Github and can (I think)
be used with any git server.

GitInstall <- function(
                       repo,
                       branch = "HEAD",
                       remote="git at yougitserver:yourrepo"
                       ) {
  repo <- as.character(substitute(repo))
  tartf <- tempfile()
  pkgtd <- tempfile()
  dir.create(pkgtd)
  on.exit(unlink(c(tartf, pkgtd)))
  system(
         paste(
               "git archive --format=tar --remote=",
               remote, repo,
               " ", branch, "> ",
               tartf,
               sep=""
               ))
  message("Attempting to install ", repo, " from ", remote, ".")
  system(paste(
               "tar -xf",
               tartf,
               "-C",
               pkgtd
               ))
  install.packages(pkgtd, repo=NULL, type="source")
}

You can then wrap it in a bash function such as

function RInstall {
sudo R --quiet --slave <<EOF
Package1::GitInstall($1)
EOF
}

(where it is assumed that you put the GitInstall function in Package1)

So that to update a package and deploy it to a remote machine you just
need to type:
- on the local machine: RPush Package1 'commit message'
- on the remote machine: RInstall Package1

I don't think it gets much lazier than that!

Hope it helps, though I am sure this code is not super clean and may
break for other people..

Timothee
On Sun, Sep 11, 2011 at 1:48 AM, <Mark.Bravington at csiro.au> wrote: