R CMD check for the R code from vignettes

Fri, May 30, 2014 9:22 PM

Hi Kevin,

Personally I also avoid code that have side effects in the inline
expressions, but I think there are legitimate use cases in which
inline expressions have side effects. This discussion was motivated by
Carl's knitcitations package, as well as another question on
StackOverflow (http://stackoverflow.com/q/23927325/559676).

I'm aware of the distinction between the original literate programming
paradigm and the one in R (that is why I said "literate programming in
R" instead of "literate programming in general"). In R, weave actually
does what both weave and tangle do in the original paradigm -- there
is no need to tangle the document to get the computer code so that we
can execute it.

To Carl: I agree that it is a little extreme to drop tangle entirely,
so I think at least knitr::purl() will stay there in the foreseeable
future. I tend to adopt Henrik's idea, i.e., to provide vignette
engines that just ignore tangle. At the moment, it seems R CMD check
is comfortable with vignettes that do not have corresponding R
scripts, and I hope these R scripts will not become mandatory in the
future.

Thanks everyone for your comments!

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Web: http://yihui.name


On Fri, May 30, 2014 at 8:21 AM, Kevin Coombes

<kevin.r.coombes at gmail.com> wrote:

Hi,

Unless someone is planning to change Stangle to include inline expressions
(which I am *not* advocating), I think that relying on side-effects within
an \Sexpr construction is a bad idea. So, my own coding style is to restrict
my use of \Sexpr to calls of the form
\Sexpr{show.the.value.of.this.variable}. As a result, I more-or-less believe
that having R CMD check use Stangle and report an error is probably a good
thing.

There is a completely separate questions about the relationship between
Sweave/Stangle or knit/purl and literate programming that is linked to your
question about whether to use Stangle on vignettes. The underlying model(s)
in R have drifted away from Knuth's original conception, for some good
reasons.

The original goal of literate programming was to be able to explain the
algorithms and data structures in the code to humans.  For that purpose, it
was important to have named code chunks that you could move around, which
would allow you to describe the algorithm starting from a high level
overview and then drilling down into the details. From this perspective,
"tangle" was critical to being able to reconstruct a program that would
compile and run correctly.

The vast majority of applications of Sweave/Stangle or knit/purl in modern R
have a completely different goal: to produce some sort of document that
describes the results of an analysis to a non-programmer or
non-statistician.  For this goal, "weave" is much more important than
"tangle", because the most important aspect is the ability to integrate the
results (figures, tables, etc) of running the code into the document that
get passed off to the person for whom the analysis was prepared. As a
result, the number of times in my daily work that I need to explicitly
invoke Stangle (or purl) explicitly is many orders of magnitude smaller than
the number of times that I invoke Sweave (or knitr).

  -- Kevin



On 5/30/2014 1:04 AM, Yihui Xie wrote:

Hi,

Recently I saw a couple of cases in which the package vignettes were
somewhat complicated so that Stangle() (or knitr::purl() or other
tangling functions) can fail to produce the exact R code that is
executed by the weaving function Sweave() (or knitr::knit(), ...). For
example, this is a valid document that can pass the weaving process
but cannot generate a valid R script to be source()d:

\documentclass{article}
\begin{document}
Assign 1 to x: \Sexpr{x <- 1}
<<>>=
x + 1
@
\end{document}

That is because the inline R code is not written to the R script
during the tangling process. When an R package vignette contains
inline R code expressions that have significant side effects, R CMD
check can fail because the tangled output is not correct. What I
showed here is only a trivial example, and I have seen two packages
that have more complicated scenarios than this. Anyway, the key thing
that I want to discuss here is, since the R code in the vignette has
been executed once during the weaving process, does it make much sense
to execute the code generated from the tangle function? In other
words, if the weaving process has succeeded, is it necessary to
source() the R script again?

The two options here are:

1. Do not check the R code from vignettes;
2. Or fix the tangle function so that it produces exactly what was
executed in the weaving process. If this is done, I'm back to my
previous question: does it make sense to run the code twice?

To push this a little further, personally I do not quite appreciate
literate programming in R as two separate steps, namely weave and
tangle. In particular, I do not see the value of tangle, considering
Sweave() (or knitr::knit()) as the new "source()". Therefore
eventually I tend to just drop tangle, but perhaps I missed something
here, and I'd like to hear what other people think about it.

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Web: http://yihui.name

R CMD check for the R code from vignettes

Thread (15 messages)