[Bioc-devel] Is it OK for Rmd package vignettes to be rendered as PDF?
On Thu, Aug 18, 2016 at 4:45 PM, Wolfgang Huber <whuber at embl.de> wrote:
On 17 Aug 2016, at 13:02, Henrik Bengtsson <henrik.bengtsson at gmail.com> wrote: R CMD build, which is what triggers vignette building, only supports one output file (HTML or PDF) per vignette. It will basically ignore duplicate output formats. This is by design / legacy reasons. Technically it wouldn't be hard to add support for multiple output formats, but that would require changes to R itself - I think it could be a useful feature.
Henrik, I?m sure you have more experience and insight with this than I, but I wonder when (at what stage) and what for R needs to be changed? It seems there are several issues: (a) having both the PDF and HTML be built by the build system and be shipped with the package (b) making them discoverable on the Bioc package landing page, and on the index page of the R-help system. (c) making (a) and (b) easy and standardized for package authors Re (a), on first sight, it seems that simply adding the YAML lines Ramon mentioned to the vignette will NOT achieve this (it looks like only whatever is the first output format stated, is produced), but according to https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Writing-package-vignettes https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Non_002dSweave-vignettes I expect that with sufficient cleverness with (i) a Makefile and/or (ii) registering your own VignetteBuilder (some wrapper around knitr::render that makes sure both outputs are built, with only one run of the R code) it should be possible to achieve (a). For something almost as good as (b) [or better?], you could have the HTML indexed, and in it e.g. at the top have a button with a link to the PDF file, for those who want to print it.
For (c), I suppose changing R would be handy. Or BiocStyle?
Just a quick background: I was the one adjusting a large chunk of the vignette code for R 3.0.0 (Feb-March 2013) in order to add support for generic vignette engines after Yihui Xie and Duncan Murdoch had laid the groundwork adding support for knitr. When doing this I did think about supporting multiple weave output formats (and also keeping intermediate TeX / Markdown / ... files). What I can recall from this is that it shouldn't be too hard to do this. The reason why it wasn't done was mostly due to the fact that it would require an agreement by others / R core and time was very short (just before the R 3.0.0 release) so updates were kept at a minimum. Wolfgang, to answer you question: In my previous reply, I was focusing on the R CMD build process itself because that's where most of the action is happening when it comes to building vignettes, but there're other parts that need to be updated as well. But the core of the issue here is that R, or more precisely the tools package, assumes there should be exactly one output product per vignette. For instance, in tools:::find_vignette_product () [https://github.com/wch/r-source/blob/trunk/src/library/tools/R/Vignettes.R#L80-L84] we have checks like: if (length(output) > 2L || (final && length(output) > 1L)) stop(gettextf("Located more than one %s output file (by engine %s) for vignette with name %s: %s", sQuote(by), sQuote(sprintf("%s::%s", engine$package, engine$name)), sQuote(name), paste(sQuote(output), collapse=", ")), domain = NA) where 'output' holds any matching *.pdf and *.html file (and final = TRUE). (In my previous comment I said duplicated outputs would be ignored, but it seems that there'll be an error instead). There is also an internal vignette "meta data" data frame holding the vignette name, title, weave and tangle output files (or something like that). The weave output field is a character vector of one element per vignette. This data frame is used in several places. This has to be updated such that it can hold more than one weave output file per vignette, i.e. something like meta$weave[[idx]] should be able to hold one or more strings. Then functions / mechanisms that make use of this meta data need to be adjusted, e.g. vignettes(), vignette(), functions to build the vignette index HTML page etc. There's probably needs to be new features added, e.g. what format should be opened by default when calling vignette()? So, again, I think this is fairly straightforward to implement, but the first step is to convince R core that this should be done. I think one strong argument is that PDF alone is a rather bad format for screen readers while HTML is a much better in this sense. One could also imagine vignette engines that are designed to provide highly screen-reader friendly output files / formats in addition to the standard HTML / PDF formats. This raises the question whether R should do this or if that's better left to other software convert this from the HTML file. On the other hand, maybe the HTML file doesn't contain all necessary information and it's better to work off an intermediate file format. As Martin points out, preferably vignette engines that output to multiple formats should be smart enough not to rerun everything from scratch, but instead generate the PDF and HTML files based on some intermediate static format (e.g. Markdown). /Henrik
Wolfgang
A related question is where some prefer to have access to also the intermediate plain Markdown / TeX rather than the final HTML / PDF product, e.g. because they work better with screen readers. The only way I see you can have a PDF and a HTML version at the same time is to create to identical vignettes each outputting a specific format. Henrik On Aug 17, 2016 12:17, "Ramon Diaz-Uriarte" <rdiaz02 at gmail.com> wrote:
Dear All,
I am considering rewriting the vignette of one BioC package I maintain as
Rmd (it is currently Rnw). But I would like that the entry under
"Documentation" contain a PDF of the vignette; it can ideally also contain
the HTML version too, but I do not want it to not have the PDF[1].
I know I can add entries to the document header such as
output:
BiocStyle::pdf_document:
toc: true
BiocStyle::html_document:
toc: true
that will, when run locally via "render('file.Rmd', output_format =
'all')", produce both formats.
I've googled around, but I am not sure about:
1. If I have both output formats specified in the document header, will the
BioC page of the package actually show both the PDF and the HTML of the
vignette?
2. Is it OK (in conforming with BioC policies, sensible[1], whatever) to
even try/want this? My reading of the doc for the BiocStyle
(https://www.bioconductor.org/packages/devel/bioc/vignettes/
BiocStyle/inst/doc/HtmlStyle.html)
seems to suggest that the "natural" thing for Rmd vignettes is to be
rendered as HTML, but I have not seen that producing PDF is discouraged
explicitly.
Best,
R.
[1] Why do I want to get a PDF if I am using Rmd? I want a PDF because this
is a fairly long document that some users want to be able to print. I want
HTML because some users prefer HTML and because I'd like to also place the
vignette as HTML in Github Pages. I think that the only way to accomplish
both is to use Rmd (not Rnw, even if I really, really, prefer LaTeX :-).
--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Aut?noma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain
Phone: +34-91-497-2412
Email: rdiaz02 at gmail.com
ramon.diaz at iib.uam.es
http://ligarto.org/rdiaz
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel