Skip to content
Prev 45103 / 63424 Next

Suggestion: Custom filename patterns for non-Sweave vignettes

Hi Duncan,

thanks you for your prompt reply.


On Fri, Feb 15, 2013 at 1:15 AM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
So, isn't that somewhat already taken care of by the 'VignetteBuilder'
field in DESCRIPTION?  It specifies additional builders in addition to
the default/builtin Sweave builder.  Conflicts would only happen if a
package developer (e.g. PkgA) includes a pattern that either (A)
overrides the builtin in "[.][RrSs](nw|tex)$" / "[.]Rmd$" patterns, or
(B) specifies to builders with the same patterns.  First of all, there
are not that many builder packages, so this is something that could be
negotiated among those to minimize conflicts.  Second, case (A) can be
protected against by not allowing builder packages (e.g. knitr, rsp,
...) to add/register those patterns (tricky but possible to test for)
(but only default to them if that is what they wish to use).  For case
(B), the developer of package PkgA has the power to avoid conflicts.
One could also imagine the ordering of packages listed in
'VignetteBuilder' would provide a prioritization.

BTW, case (A) is basically what the new design is already providing;
all builder packages use the same patterns.

So, from a package building point of view, I don't see how this would
make it messier.  I can see that when checking a package it is harder
to validate matches between input and output formats (is that done?).
Let me know if I simplifying things too much - then I'll read up on
the 'R CMD *' source code.
I see your concern, but is there really a significant risk for this?
And if it would occur, (i) it would be contained to PkgA, (ii) the
developer of package PkgA would quickly detect it, and (iii) the
"badly behaving" builder package would rather soon flagged as doing
something bad (and its developer would be informed and so on).
Yes, supporting PDF output makes sense.  One may also consider
generation of plain *.txt files (think README.txt and similar).  As I
see it, the restriction on supported *output* formats are given by
what the R help system wish to support (which is basically *.pdf and
*.html documents).  It's clear that the decision on what to support is
up to the maintainer of the R system (i.e. R core).

When it comes to input/source files for generating those output files,
it's harder to argue for restrictions.  As I understand it, the new
support for non-Sweave vignettes is moving away from such restriction,
which is great.  Despite the restrictions on file extension, it is
possible to "hijack" (my words) any of the supported extension for
whatever reason you want, as long as you produce a *.pdf or *.html
document in the end.  More below...
I still find it unfortunate that the R system opens up for processing
any type of input files but enforces those to have certain filename
extensions.

As a real example, today Sweave and knitr both use *.Rnw.  This means
that if I send someone a standalone *.Rnw file, they will not be able
to tell how to compile it without further instructions from me or by
inspecting the content type, or by trial and error.  I believe that
makes reproducible research a bit more tedious.  With unique filename
extensions, life is easier.  It's easy to imagine that if other
builder packages (e.g. R.rsp, brew, ...) also start using *.Rnw,
things are not going to become better.  The current "rules" are
pushing things in that direction.  To take an extreme stand, it's a
little bit like using *.txt for all your C, C++, Erlang, Fortran,
Simula, ... code, because it in the end of the day they all compile to
binaries anyway.

One may argue that the Rnw/Rtex/Rmd extensions only apply to the R
package vignettes and you can still use other extensions when you work
with standalone vignette source files.  That's of course also
unfortunate, because that will add additional confusion, e.g "You can
find the vignette in my package, but by the way you should really
rename it because ...".  The exact same source file will have
different extensions depending on context.  (In my own case, I found
*.tex.rsp, *.html.rsp, *.md.rsp, *.Rnw.rsp, ... to be much less
ambiguous and I prefer not to introduce ambiguity in mapping those to
*.Rnw/*.Rtex/*.Rmd.)

Finally, the supported extensions are basically *.Rnw, *.Rtex and
*.Rmd.  To break those down, "*nw" originates from 'Noweb'
[http://wikipedia.org/wiki/Noweb], "*tex" from TeX
[http://wikipedia.org/wiki/latex] and "*md" from Markdown
[wikipedia.org/wiki/Markdown].  The "R*" part indicates that there is
some additional markup format to those file formats.  But in the end
of the day, they indicate that the source files should be
markup-embedded files containing some flavor of Noweb, TeX or
Markdown.  I find it weird to use those also for, say, formats such as
HTML, reStructuredText, AsciiDoc, MediaWiki, Org-Mode etc.


To summarize, I really appreciate the move to a built-in support for
non-Sweave vignettes (without using custom Makefiles), but I find that
the supported filename extensions has not been brought along in this
move.


Thanks again,

Henrik