Skip to content
Prev 9620 / 21307 Next

[Bioc-devel] Is it OK for Rmd package vignettes to be rendered as PDF?

On Thu, Aug 18, 2016 at 4:45 PM, Wolfgang Huber <whuber at embl.de> wrote:
Just a quick background: I was the one adjusting a large chunk of the
vignette code for R 3.0.0 (Feb-March 2013) in order to add support for
generic vignette engines after Yihui Xie and Duncan Murdoch had laid
the groundwork adding support for knitr.  When doing this I did think
about supporting multiple weave output formats (and also keeping
intermediate TeX / Markdown / ... files). What I can recall from this
is that it shouldn't be too hard to do this.  The reason why it wasn't
done was mostly due to the fact that it would require an agreement by
others / R core and time was very short (just before the R 3.0.0
release) so updates were kept at a minimum.

Wolfgang, to answer you question: In my previous reply, I was focusing
on the R CMD build process itself because that's where most of the
action is happening when it comes to building vignettes, but there're
other parts that need to be updated as well.  But the core of the
issue here is that R, or more precisely the tools package, assumes
there should be exactly one output product per vignette.  For
instance, in tools:::find_vignette_product
() [https://github.com/wch/r-source/blob/trunk/src/library/tools/R/Vignettes.R#L80-L84]
we have checks like:

if (length(output) > 2L || (final && length(output) > 1L))
    stop(gettextf("Located more than one %s output file (by engine %s)
for vignette with name %s: %s", sQuote(by),
sQuote(sprintf("%s::%s", engine$package, engine$name)),
sQuote(name), paste(sQuote(output), collapse=", ")),
domain = NA)

where 'output' holds any matching *.pdf and *.html file (and final =
TRUE).  (In my previous comment I said duplicated outputs would be
ignored, but it seems that there'll be an error instead).

There is also an internal vignette "meta data" data frame holding the
vignette name, title, weave and tangle output files (or something like
that).  The weave output field is a character vector of one element
per vignette.  This data frame is used in several places.  This has to
be updated such that it can hold more than one weave output file per
vignette, i.e. something like meta$weave[[idx]] should be able to hold
one or more strings.  Then functions / mechanisms that make use of
this meta data need to be adjusted, e.g. vignettes(), vignette(),
functions to build the vignette index HTML page etc.  There's probably
needs to be new features added, e.g. what format should be opened by
default when calling vignette()?

So, again, I think this is fairly straightforward to implement, but
the first step is to convince R core that this should be done.  I
think one strong argument is that PDF alone is a rather bad format for
screen readers while HTML is a much better in this sense.  One could
also imagine vignette engines that are designed to provide highly
screen-reader friendly output files / formats in addition to the
standard HTML / PDF formats.  This raises the question whether R
should do this or if that's better left to other software convert this
from the HTML file.  On the other hand, maybe the HTML file doesn't
contain all necessary information and it's better to work off an
intermediate file format.

As Martin points out, preferably vignette engines that output to
multiple formats should be smart enough not to rerun everything from
scratch, but instead generate the PDF and HTML files based on some
intermediate static format (e.g. Markdown).

/Henrik