Almost all of the coxme package and an increasing amount of the survival
package are now written in noweb, i.e., .Rnw files. It would be nice to
process these using the Sweave function + a special driver, which I can
do using a modified version of Sweave. The primary change is to allow
the following type of construction
<<coxme>>
coxme <- function(formula, data, subset, blah blah ){
<<coxme-check-arguments>>
<<coxme-build>>
<<coxme-compute>>
<<coxme-finish>>
}
@
where the parts referred to come later, and will themselves be made up
of other parts. Since the point of this file is to document source
code, the order in which chunks are defined is driven by "create a
textbook" thoughts and won't match the final code order for R.
The standard noweb driver only allows one level of recursion, and no
references to things defined further down in the file.
The primary change to the function simply breaks the main loop into
two parts: first read through the all the lines and create a list of
code chunks (some with names), then go through the list of chunks and
call driver routines. There are a couple of other minor details, e.g. a
precheck for infinite recursions, but no change to what is passed to the
driver routines, nor to anything but the Sweave function itself.
Primary question: who on the core team should I be holding this
conversation with?
Secondary: Testing level? I have a few vignettes but not many.
I'll need a "noweb" package anyway to contain the drivers -- should
we just duplicate the modified Sweave under another name?
Call the package "noweb", "Rnoweb", ...?
And before someone asks: Roxygen is a completely different animal and
doesn't address what I need. I have latex equations just above the code
that impliments them, an annotated graph of the call tree next to the
section parsing a formula, etc. This is stuff that doesn't fit in
comment lines. The text/code ratio is >1. On the other hand I've
thought very little about integration of manual pages and description
files with the code, issues which Roxygen addresses.
Terry Therneau
Sweave driver extension
6 messages · Yihui Xie, Kevin Coombes, Terry Therneau
Maybe this is a my personal taste: I do not like pseudo R code in the
form <<coxme-build>> inside a chunk, and I'm curious about why you do
not use real R functions to do the job.
coxme <- function(formula, data, subset, blah blah ){
coxme_check_arguments(...)
coxme_build(...)
coxme_compute(...)
coxme_finish(...)
}
You can define these coxme_xxx functions later in the parent
environment. It is also easy for one function to call another, so the
recursion is natural. Compared to text-processing tricks, I prefer
well-defined functions.
Your idea of using a named list to store R code is what I used in the
knitr package (http://yihui.github.com/knitr/demo/reference/), e.g.
% empty here
<<chunk1, echo=TRUE>>=
@
% real code is defined here
<<chunk1, echo=FALSE>>=
rnorm(10)
@
The second chunk appears later, but when you weave the document, the
code rnorm(10) will also go to the first chunk since the label
'chunk1' will index the code from the second chunk.
Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA
On Tue, Jan 24, 2012 at 1:50 PM, Terry Therneau <therneau at mayo.edu> wrote:
Almost all of the coxme package and an increasing amount of the survival
package are now written in noweb, i.e., .Rnw files. ?It would be nice to
process these using the Sweave function + a special driver, which I can
do using a modified version of Sweave. ?The primary change is to allow
the following type of construction
<<coxme>>
coxme <- function(formula, data, subset, blah blah ?){
? <<coxme-check-arguments>>
? <<coxme-build>>
? <<coxme-compute>>
? <<coxme-finish>>
}
@
where the parts referred to come later, and will themselves be made up
of other parts. ?Since the point of this file is to document source
code, the order in which chunks are defined is driven by "create a
textbook" thoughts and won't match the final code order for R.
The standard noweb driver only allows one level of recursion, and no
references to things defined further down in the file.
?The primary change to the function simply breaks the main loop into
two parts: first read through the all the lines and create a list of
code chunks (some with names), then go through the list of chunks and
call driver routines. ?There are a couple of other minor details, e.g. a
precheck for infinite recursions, but no change to what is passed to the
driver routines, nor to anything but the Sweave function itself.
Primary question: who on the core team should I be holding this
conversation with?
Secondary: Testing level? ?I have a few vignettes but not many.
? ?I'll need a "noweb" package anyway to contain the drivers -- should
we just duplicate the modified Sweave under another name?
? ?Call the package "noweb", "Rnoweb", ...?
And before someone asks: Roxygen is a completely different animal and
doesn't address what I need. ?I have latex equations just above the code
that impliments them, an annotated graph of the call tree next to the
section parsing a formula, etc. This is stuff that doesn't fit in
comment lines. The text/code ratio is >1. ?On the other hand I've
thought very little about integration of manual pages and description
files with the code, issues which Roxygen addresses.
Terry Therneau
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
5 days later
I prefer the code chunks myself.
Function calls have overhead. In a bioinformatics world with large
datasets and an R default that uses call-by-value rather than
call-by-reference, the function calls may have a _lot_ of overhead.
Writing the functions to make sure they use call-by-reference for the
large objects instead has a different kind of overhead in the stress it
puts on the writers and maintainers of code.
But then, I'm old enough to have looked at some of Knuth's source code
for TeX and read his book on Literate Programming, where the ideas of
"weave" and "tangle" were created for exactly the kind of application
that Terry asked about. Knuth's fundamental idea here is that the
documentation (mainly the stuff processed through "weave") is created
for humans, while the executable code (in Knuth's view, the stuff
created by "tangle") is intended for computers. If you want people to
understand the code, then you often want to use a top-down approach that
outlines the structure -- code chunks with forward references work
perfectly for this purpose.
One of the difficulties in mapping Knuth's idea over to R and Sweave is
that the operations of weave and tangle have gotten, well, tangled.
Sweave does not just prepare the documentation; it also executes the
code in order to put the results of the computation into the
documentation. In order to get the forward references to work with
Sweave, you would have to makes two passes through the file: one to make
sure you know where each named chunk is and build a cross-reference
table, and one to actually execute the code in the correct order. That
would presumably also require a major rewrite of Sweave.
The solution I use is to cheat and hide the chunks initially and reveal
them later to get the output that want. This comes down to combining
eval, echo, keep.source, and expand in the right combinations. Something
like:
%%%%%%%%
% set up a prologue that contains the code chunks. Do not evaluate or
display them.
<<coxme-check-arguments,echo=FALSE,eval=FALSE>>=
# do something sensible. If multiple steps, define them above here
# using the same idea.
@
% also define the other code chunks here
\section{Start the First Section}
The \texttt{coxme} function is defined as follows:
<<coxme,keep.source=TRUE,expand=FALSE>>=
coxme <- function(formula, data, subset, blah blah ){
<<coxme-check-arguments>>
<<coxme-build>>
<<coxme-compute>>
<<coxme-finish>>
}
@
Argument checking is important:
<<name-does-not-matter-since-not-reused,eval=FALSE,expand=TRUE>>=
<<coxme-check-arguments>>=
@
% Describe the other chunks here
%%%%%%%%
Kevin
On 1/24/2012 10:24 PM, Yihui Xie wrote:
Maybe this is a my personal taste: I do not like pseudo R code in the
form<<coxme-build>> inside a chunk, and I'm curious about why you do
not use real R functions to do the job.
coxme<- function(formula, data, subset, blah blah ){
coxme_check_arguments(...)
coxme_build(...)
coxme_compute(...)
coxme_finish(...)
}
You can define these coxme_xxx functions later in the parent
environment. It is also easy for one function to call another, so the
recursion is natural. Compared to text-processing tricks, I prefer
well-defined functions.
Your idea of using a named list to store R code is what I used in the
knitr package (http://yihui.github.com/knitr/demo/reference/), e.g.
% empty here
<<chunk1, echo=TRUE>>=
@
% real code is defined here
<<chunk1, echo=FALSE>>=
rnorm(10)
@
The second chunk appears later, but when you weave the document, the
code rnorm(10) will also go to the first chunk since the label
'chunk1' will index the code from the second chunk.
Regards,
Yihui
--
Yihui Xie<xieyihui at gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA
On Tue, Jan 24, 2012 at 1:50 PM, Terry Therneau<therneau at mayo.edu> wrote:
Almost all of the coxme package and an increasing amount of the survival
package are now written in noweb, i.e., .Rnw files. It would be nice to
process these using the Sweave function + a special driver, which I can
do using a modified version of Sweave. The primary change is to allow
the following type of construction
<<coxme>>
coxme<- function(formula, data, subset, blah blah ){
<<coxme-check-arguments>>
<<coxme-build>>
<<coxme-compute>>
<<coxme-finish>>
}
@
where the parts referred to come later, and will themselves be made up
of other parts. Since the point of this file is to document source
code, the order in which chunks are defined is driven by "create a
textbook" thoughts and won't match the final code order for R.
The standard noweb driver only allows one level of recursion, and no
references to things defined further down in the file.
The primary change to the function simply breaks the main loop into
two parts: first read through the all the lines and create a list of
code chunks (some with names), then go through the list of chunks and
call driver routines. There are a couple of other minor details, e.g. a
precheck for infinite recursions, but no change to what is passed to the
driver routines, nor to anything but the Sweave function itself.
Primary question: who on the core team should I be holding this
conversation with?
Secondary: Testing level? I have a few vignettes but not many.
I'll need a "noweb" package anyway to contain the drivers -- should
we just duplicate the modified Sweave under another name?
Call the package "noweb", "Rnoweb", ...?
And before someone asks: Roxygen is a completely different animal and
doesn't address what I need. I have latex equations just above the code
that impliments them, an annotated graph of the call tree next to the
section parsing a formula, etc. This is stuff that doesn't fit in
comment lines. The text/code ratio is>1. On the other hand I've
thought very little about integration of manual pages and description
files with the code, issues which Roxygen addresses.
Terry Therneau
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
OK, I did not realize the overhead problem is so overwhelming in your
situation. Therefore I re-implemented the chunk reference in the knitr
package in another way. In Sweave we use
<<a>>=
# code in chunk a
@
<<b>>=
# use code in a
<<a>>
@
And in knitr, we can use real R code:
<<a>>=
# code in chunk a
@
<<b>>=
# use code in a
run_chunk('a')
@
This also allows arbitrary levels of recursion, e.g. I add another
chunk called 'c':
<<c>=
run_chunk('b')
@
Because b uses a, so when c calls b, it will consequently call a as well.
The function run_chunk() will not bring overhead problems, because it
simply extracts the code from other chunks and evaluates it here. It
is not a functional call. This feature is still in the development
version (well, I did it this afternoon):
https://github.com/yihui/knitr.
--------------
Talking about Knuth's original idea, I do not know as much as you, but
under knitr's design, you can arrange code freely, since the code is
stored in a named list after the input document is parsed. You can
define code before using it, or use it before defining it (later); it
is indexed by the chunk label. Top-down or bottom-up, in whatever
order you want. And you are right; it requires a major rewrite, and
that is exactly what I tried to do. I appreciate your feedback because
I know you have very rich experience in reproducible research.
Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA
On Mon, Jan 30, 2012 at 12:07 PM, Kevin R. Coombes
<kevin.r.coombes at gmail.com> wrote:
I prefer the code chunks myself.
Function calls have overhead. In a bioinformatics world with large datasets
and an R default that uses call-by-value rather than call-by-reference, the
function calls may have a _lot_ of overhead. ?Writing the functions to make
sure they use call-by-reference for the large objects instead has a
different kind of overhead in the stress it puts on the writers and
maintainers of code.
But then, I'm old enough to have looked at some of Knuth's source code for
TeX and read his book on Literate Programming, where the ideas of "weave"
and "tangle" were created for exactly the kind of application that Terry
asked about. ?Knuth's fundamental idea here is that the documentation
(mainly the stuff processed through "weave") is created for humans, while
the executable code (in Knuth's view, the stuff created by "tangle") is
intended for computers. ?If you want people to understand the code, then you
often want to use a top-down approach that outlines the structure -- code
chunks with forward references work perfectly for this purpose.
One of the difficulties in mapping Knuth's idea over to R and Sweave is that
the operations of weave and tangle have gotten, well, tangled. ?Sweave does
not just prepare the documentation; it also executes the code in order to
put the results of the computation into the documentation. ?In order to get
the forward references to work with Sweave, you would have to makes two
passes through the file: one to make sure you know where each named chunk is
and build a cross-reference table, and one to actually execute the code in
the correct order. ?That would presumably also require a major rewrite of
Sweave.
The solution I use is to cheat and hide the chunks initially and reveal them
later to get the output that want. This comes down to combining eval, echo,
keep.source, and expand in the right combinations. Something like:
%%%%%%%%
% set up a prologue that contains the code chunks. Do not evaluate or
display them.
<<coxme-check-arguments,echo=FALSE,eval=FALSE>>=
# do something sensible. If multiple steps, define them above here
# using the same idea.
@
% also define the other code chunks here
\section{Start the First Section}
The \texttt{coxme} function is defined as follows:
<<coxme,keep.source=TRUE,expand=FALSE>>=
coxme <- function(formula, data, subset, blah blah ?){
<<coxme-check-arguments>>
<<coxme-build>>
<<coxme-compute>>
<<coxme-finish>>
}
@
Argument checking is important:
<<name-does-not-matter-since-not-reused,eval=FALSE,expand=TRUE>>=
<<coxme-check-arguments>>=
@
% Describe the other chunks here
%%%%%%%%
? ?Kevin
Three thinngs - My original questions to R-help was "who do I talk to". That was answered by Brian R, and the discussion of how to change Sweave moved offline. FYI, I have a recode in hand that allows arbitrary reordering of chunks; but changes to code used by hundreds need to be approached cautiously. Like the witch says in Wizard of Oz: "... But that's not what's worrying me, it's how to do it. These things must be done delicately, or you hurt the spell." A few emails have made me aware of others who use noweb. Most of them, as I have, use the original Unix utility. But since survival is so interwoven with R I am trying to impliment that functionality entirely in R to make the code self contained. Just working out how to best do so. Yihui: with respect to the note below, I don't see why you want to add new syntax. Why add "run_chunk(a)" when it is a synonym for <<a>>? Terry T.
On Mon, 2012-01-30 at 20:41 -0600, Yihui Xie wrote:
OK, I did not realize the overhead problem is so overwhelming in your
situation. Therefore I re-implemented the chunk reference in the knitr
package in another way. In Sweave we use
<<a>>=
# code in chunk a
@
<<b>>=
# use code in a
<<a>>
@
And in knitr, we can use real R code:
<<a>>=
# code in chunk a
@
<<b>>=
# use code in a
run_chunk('a')
@
This also allows arbitrary levels of recursion, e.g. I add another
chunk called 'c':
<<c>=
run_chunk('b')
@
Because b uses a, so when c calls b, it will consequently call a as well.
The function run_chunk() will not bring overhead problems, because it
simply extracts the code from other chunks and evaluates it here. It
is not a functional call. This feature is still in the development
version (well, I did it this afternoon):
https://github.com/yihui/knitr.
--------------
Talking about Knuth's original idea, I do not know as much as you, but
under knitr's design, you can arrange code freely, since the code is
stored in a named list after the input document is parsed. You can
define code before using it, or use it before defining it (later); it
is indexed by the chunk label. Top-down or bottom-up, in whatever
order you want. And you are right; it requires a major rewrite, and
that is exactly what I tried to do. I appreciate your feedback because
I know you have very rich experience in reproducible research.
Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA
On Mon, Jan 30, 2012 at 12:07 PM, Kevin R. Coombes
<kevin.r.coombes at gmail.com> wrote:
I prefer the code chunks myself.
Function calls have overhead. In a bioinformatics world with large datasets
and an R default that uses call-by-value rather than call-by-reference, the
function calls may have a _lot_ of overhead. Writing the functions to make
sure they use call-by-reference for the large objects instead has a
different kind of overhead in the stress it puts on the writers and
maintainers of code.
But then, I'm old enough to have looked at some of Knuth's source code for
TeX and read his book on Literate Programming, where the ideas of "weave"
and "tangle" were created for exactly the kind of application that Terry
asked about. Knuth's fundamental idea here is that the documentation
(mainly the stuff processed through "weave") is created for humans, while
the executable code (in Knuth's view, the stuff created by "tangle") is
intended for computers. If you want people to understand the code, then you
often want to use a top-down approach that outlines the structure -- code
chunks with forward references work perfectly for this purpose.
One of the difficulties in mapping Knuth's idea over to R and Sweave is that
the operations of weave and tangle have gotten, well, tangled. Sweave does
not just prepare the documentation; it also executes the code in order to
put the results of the computation into the documentation. In order to get
the forward references to work with Sweave, you would have to makes two
passes through the file: one to make sure you know where each named chunk is
and build a cross-reference table, and one to actually execute the code in
the correct order. That would presumably also require a major rewrite of
Sweave.
The solution I use is to cheat and hide the chunks initially and reveal them
later to get the output that want. This comes down to combining eval, echo,
keep.source, and expand in the right combinations. Something like:
%%%%%%%%
% set up a prologue that contains the code chunks. Do not evaluate or
display them.
<<coxme-check-arguments,echo=FALSE,eval=FALSE>>=
# do something sensible. If multiple steps, define them above here
# using the same idea.
@
% also define the other code chunks here
\section{Start the First Section}
The \texttt{coxme} function is defined as follows:
<<coxme,keep.source=TRUE,expand=FALSE>>=
coxme <- function(formula, data, subset, blah blah ){
<<coxme-check-arguments>>
<<coxme-build>>
<<coxme-compute>>
<<coxme-finish>>
}
@
Argument checking is important:
<<name-does-not-matter-since-not-reused,eval=FALSE,expand=TRUE>>=
<<coxme-check-arguments>>=
@
% Describe the other chunks here
%%%%%%%%
Kevin
On Tue, Jan 31, 2012 at 7:18 AM, Terry Therneau <therneau at mayo.edu> wrote:
Three thinngs - ? My original questions to R-help was "who do I talk to". ?That was answered by Brian R, and the discussion of how to change Sweave moved offline. ?FYI, I have a recode in hand that allows arbitrary reordering of chunks; but changes to code used by hundreds need to be approached cautiously. ?Like the witch says in Wizard of Oz: "... But that's not what's worrying me, it's how to do it. ?These things must be done delicately, or you hurt the spell." ? A few emails have made me aware of others who use noweb. ?Most of them, as I have, use the original Unix utility. ?But since survival is so interwoven with R I am trying to impliment that functionality entirely in R to make the code self contained. ?Just working out how to best do so. ?Yihui: with respect to the note below, I don't see why you want to add new syntax. ?Why add "run_chunk(a)" when it is a synonym for <<a>>?
A short answer is it is easy to implement, because run_chunk() uses eval() which naturally supports recursion (you can eval(parse(text = "eval(parse(text = ...))"))). I think I will add <<>> in the next few days too. There have been quite a few features like this one that I did not plan to do because I do not use them at all, but I added them to knitr one by one anyway when I saw convincing reasons (function overhead problems in this case). So I really appreciate these discussions. I feel r-devel is not a good place for me to chime in, so I will turn the discussion irrelevant to r-devel offline later.
Terry T.
Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA