Skip to content

R CMD check for the R code from vignettes

15 messages · Kevin Coombes, Carl Boettiger, Henrik Bengtsson +4 more

#
Hi,

Recently I saw a couple of cases in which the package vignettes were
somewhat complicated so that Stangle() (or knitr::purl() or other
tangling functions) can fail to produce the exact R code that is
executed by the weaving function Sweave() (or knitr::knit(), ...). For
example, this is a valid document that can pass the weaving process
but cannot generate a valid R script to be source()d:

\documentclass{article}
\begin{document}
Assign 1 to x: \Sexpr{x <- 1}
<<>>=
x + 1
@
\end{document}

That is because the inline R code is not written to the R script
during the tangling process. When an R package vignette contains
inline R code expressions that have significant side effects, R CMD
check can fail because the tangled output is not correct. What I
showed here is only a trivial example, and I have seen two packages
that have more complicated scenarios than this. Anyway, the key thing
that I want to discuss here is, since the R code in the vignette has
been executed once during the weaving process, does it make much sense
to execute the code generated from the tangle function? In other
words, if the weaving process has succeeded, is it necessary to
source() the R script again?

The two options here are:

1. Do not check the R code from vignettes;
2. Or fix the tangle function so that it produces exactly what was
executed in the weaving process. If this is done, I'm back to my
previous question: does it make sense to run the code twice?

To push this a little further, personally I do not quite appreciate
literate programming in R as two separate steps, namely weave and
tangle. In particular, I do not see the value of tangle, considering
Sweave() (or knitr::knit()) as the new "source()". Therefore
eventually I tend to just drop tangle, but perhaps I missed something
here, and I'd like to hear what other people think about it.

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Web: http://yihui.name
#
Hi,

Unless someone is planning to change Stangle to include inline 
expressions (which I am *not* advocating), I think that relying on 
side-effects within an \Sexpr construction is a bad idea. So, my own 
coding style is to restrict my use of \Sexpr to calls of the form 
\Sexpr{show.the.value.of.this.variable}. As a result, I more-or-less 
believe that having R CMD check use Stangle and report an error is 
probably a good thing.

There is a completely separate questions about the relationship between 
Sweave/Stangle or knit/purl and literate programming that is linked to 
your question about whether to use Stangle on vignettes. The underlying 
model(s) in R have drifted away from Knuth's original conception, for 
some good reasons.

The original goal of literate programming was to be able to explain the 
algorithms and data structures in the code to humans.  For that purpose, 
it was important to have named code chunks that you could move around, 
which would allow you to describe the algorithm starting from a high 
level overview and then drilling down into the details. From this 
perspective, "tangle" was critical to being able to reconstruct a 
program that would compile and run correctly.

The vast majority of applications of Sweave/Stangle or knit/purl in 
modern R have a completely different goal: to produce some sort of 
document that describes the results of an analysis to a non-programmer 
or non-statistician.  For this goal, "weave" is much more important than 
"tangle", because the most important aspect is the ability to integrate 
the results (figures, tables, etc) of running the code into the document 
that get passed off to the person for whom the analysis was prepared. As 
a result, the number of times in my daily work that I need to explicitly 
invoke Stangle (or purl) explicitly is many orders of magnitude smaller 
than  the number of times that I invoke Sweave (or knitr).

   -- Kevin
On 5/30/2014 1:04 AM, Yihui Xie wrote:
#
I think there are several aspects to Yihue's post and some simple
workarounds/long solutions to the issues:

1. For the reasons argued, I would agree that 'R CMD check'
incorrectly assumes that tangled code script should be able to run
without errors.  Instead I think it should only check the syntax, i.e.
that it can be parsed without errors.  If not, then Sweave may have to
be redfined to clarify that \Sexpr{}/"inline" expressions must not
have "side effects".

2. For other (=non-Sweave) vignette builder packages, you can already
today define engines that do not tangle, think
%\VignetteEngine{knitr::knitr_no_tangle}.

3. Extending on this, I'd like to propose %\VignetteTangle{no} (and/or
false, FALSE, ...), which would tell the engine to not generate the
"tangle" script file.  Then it is up to the vignette engine to
acknowledge this or not, but at least we will have a standard across
engines rather that each of us come up with their own markup for this.
 You can also imagine that one support other types of settings, e.g.
%\VignetteTangle{all} to include also \Sexpr{} in the tangled output.

/Henrik
On Fri, May 30, 2014 at 9:29 AM, Carl Boettiger <cboettig at gmail.com> wrote:
#
Sorry, it should be Yihui and nothing else. /Henrik
On Fri, May 30, 2014 at 10:15 AM, Henrik Bengtsson <hb at biostat.ucsf.edu> wrote:
#
Hi Kevin,

Personally I also avoid code that have side effects in the inline
expressions, but I think there are legitimate use cases in which
inline expressions have side effects. This discussion was motivated by
Carl's knitcitations package, as well as another question on
StackOverflow (http://stackoverflow.com/q/23927325/559676).

I'm aware of the distinction between the original literate programming
paradigm and the one in R (that is why I said "literate programming in
R" instead of "literate programming in general"). In R, weave actually
does what both weave and tangle do in the original paradigm -- there
is no need to tangle the document to get the computer code so that we
can execute it.

To Carl: I agree that it is a little extreme to drop tangle entirely,
so I think at least knitr::purl() will stay there in the foreseeable
future. I tend to adopt Henrik's idea, i.e., to provide vignette
engines that just ignore tangle. At the moment, it seems R CMD check
is comfortable with vignettes that do not have corresponding R
scripts, and I hope these R scripts will not become mandatory in the
future.

Thanks everyone for your comments!

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Web: http://yihui.name


On Fri, May 30, 2014 at 8:21 AM, Kevin Coombes
<kevin.r.coombes at gmail.com> wrote:
#
Note the test has been done once in weave, since R CMD check will try
to rebuild vignettes. The problem is whether the related tools in R
should change their tangle utilities so we can **repeat** the test,
and it seems the answer is "no" in my eyes.

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Web: http://yihui.name
On Sat, May 31, 2014 at 4:54 PM, Gabriel Becker <gmbecker at ucdavis.edu> wrote:
#
On 05/31/2014 03:52 PM, Yihui Xie wrote:
It is very useful, pedagogically and when reproducing analyses, to be able to 
source() the tangled .R code into an R session, analogous to running example 
code with example(). The documentation for ?Stangle does read

      (Code inside '\Sexpr{}' statements is ignored by 'Stangle'.)

So my 'vote' (recognizing that I don't have one of those) is to incorporate 
\Sexpr{} expressions into the tangled code, or to continue to flag use of Sexpr 
with side effects as errors (indirectly, by source()ing the tangled code), 
rather than writing engines that ignore tangle.

It is very valuable to all parties to write a vignette with code that is fully 
evaluated; otherwise, it is too easy for bit rot to seep in, or to 'fake' it in 
a way that seems innocent but is misleading.

Martin Morgan

  
    
#
I mentioned in my original post that Sweave()/knit()/... can be
considered as the "new" source(). They can do the same thing as
source() does. I agree that fully evaluating the code is valuable, but
it is not a problem since the weave functions do fully evaluate the
code. If there is a reason for why source() an R script is preferred,
I guess it is users' familiarity with .R instead of .Rnw/.Rmd/...,
however, I guess it would be painful to read the pure R script tangled
from the source document without the original narratives.

So what do we really lose if we turn off tangle? We lose an R script
as a derivative from the source document, but we do not lose the code
evaluation.

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Web: http://yihui.name
On Sat, May 31, 2014 at 6:20 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
#
Yes, that is a matter of familiarity as I mentioned, isn't it? I
understand this justification. I can argue that it is also convenient
to give people an Rnw/Rmd document and they can easily run the R code
chunks as well (e.g. in RStudio, chunk navigation and evaluation are
pretty simple) _within_ the context of your teaching materials.
However, I think this is drifting away from the original topic, so
I'll stop my comments on the direction of teaching.

The original question was, what do we lose if we disable tangle for R
package vignettes? Please also note I mean this is _optional_, i.e.
package authors can _choose_ whether they want to disable tangle.

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Web: http://yihui.name


On Sat, May 31, 2014 at 9:11 PM, Kasper Daniel Hansen
<kasperdanielhansen at gmail.com> wrote:
#
1. The starting point of this discussion is package vignettes, instead
of R scripts. I'm not saying we should abandon R scripts, or all
people should write R code to generate reports. Starting from a
package vignette, you can evaluate it using a weave function, or
evaluate its derivative, namely an R script. I was saying the former
might not be a bad idea, although the latter sounds more familiar to
most R users. For a package vignette, within the context of R CMD
check, is it necessary to do tangle + evaluate _besides_ weave?

2. If you are comfortable with reading pure code without narratives,
I'm totally fine with that. I guess there is nothing to argue on this
point, since it is pretty much personal taste.

3. Yes, you are absolutely correct -- Sweave()/knit() does more than
source(), but let me repeat the issue to be discussed: what harm does
it bring if we disable tangle for R package vignettes?

Sorry if I did not make it clear enough, my priority of this
discussion is the necessity of tangle for package vignettes. After we
finish this issue, I'll be happy to extend the discussion towards
tangle in general.

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Web: http://yihui.name
On Sat, May 31, 2014 at 9:20 PM, Gabriel Becker <gmbecker at ucdavis.edu> wrote: