Sweave driver extension

Mon, Jan 30, 2012 6:41 PM

OK, I did not realize the overhead problem is so overwhelming in your
situation. Therefore I re-implemented the chunk reference in the knitr
package in another way. In Sweave we use

<<a>>=
# code in chunk a
@

<<b>>=
# use code in a
<<a>>
@

And in knitr, we can use real R code:

<<a>>=
# code in chunk a
@

<<b>>=
# use code in a
run_chunk('a')
@

This also allows arbitrary levels of recursion, e.g. I add another
chunk called 'c':

<<c>=
run_chunk('b')
@

Because b uses a, so when c calls b, it will consequently call a as well.

The function run_chunk() will not bring overhead problems, because it
simply extracts the code from other chunks and evaluates it here. It
is not a functional call. This feature is still in the development
version (well, I did it this afternoon):
https://github.com/yihui/knitr.

--------------

Talking about Knuth's original idea, I do not know as much as you, but
under knitr's design, you can arrange code freely, since the code is
stored in a named list after the input document is parsed. You can
define code before using it, or use it before defining it (later); it
is indexed by the chunk label. Top-down or bottom-up, in whatever
order you want. And you are right; it requires a major rewrite, and
that is exactly what I tried to do. I appreciate your feedback because
I know you have very rich experience in reproducible research.

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA



On Mon, Jan 30, 2012 at 12:07 PM, Kevin R. Coombes

<kevin.r.coombes at gmail.com> wrote:

I prefer the code chunks myself.

Function calls have overhead. In a bioinformatics world with large datasets
and an R default that uses call-by-value rather than call-by-reference, the
function calls may have a _lot_ of overhead. ?Writing the functions to make
sure they use call-by-reference for the large objects instead has a
different kind of overhead in the stress it puts on the writers and
maintainers of code.

But then, I'm old enough to have looked at some of Knuth's source code for
TeX and read his book on Literate Programming, where the ideas of "weave"
and "tangle" were created for exactly the kind of application that Terry
asked about. ?Knuth's fundamental idea here is that the documentation
(mainly the stuff processed through "weave") is created for humans, while
the executable code (in Knuth's view, the stuff created by "tangle") is
intended for computers. ?If you want people to understand the code, then you
often want to use a top-down approach that outlines the structure -- code
chunks with forward references work perfectly for this purpose.

One of the difficulties in mapping Knuth's idea over to R and Sweave is that
the operations of weave and tangle have gotten, well, tangled. ?Sweave does
not just prepare the documentation; it also executes the code in order to
put the results of the computation into the documentation. ?In order to get
the forward references to work with Sweave, you would have to makes two
passes through the file: one to make sure you know where each named chunk is
and build a cross-reference table, and one to actually execute the code in
the correct order. ?That would presumably also require a major rewrite of
Sweave.

The solution I use is to cheat and hide the chunks initially and reveal them
later to get the output that want. This comes down to combining eval, echo,
keep.source, and expand in the right combinations. Something like:

%%%%%%%%
% set up a prologue that contains the code chunks. Do not evaluate or
display them.
<<coxme-check-arguments,echo=FALSE,eval=FALSE>>=
# do something sensible. If multiple steps, define them above here
# using the same idea.
@
% also define the other code chunks here

\section{Start the First Section}

The \texttt{coxme} function is defined as follows:
<<coxme,keep.source=TRUE,expand=FALSE>>=

coxme <- function(formula, data, subset, blah blah ?){
<<coxme-check-arguments>>
<<coxme-build>>
<<coxme-compute>>
<<coxme-finish>>
}
@

Argument checking is important:
<<name-does-not-matter-since-not-reused,eval=FALSE,expand=TRUE>>=
<<coxme-check-arguments>>=
@
% Describe the other chunks here

%%%%%%%%


? ?Kevin

Sweave driver extension

Thread (6 messages)