Skip to content

[Bioc-devel] Vignettes with many output graphics - How to fulfill the Bioc build requirements, best practises?

5 messages · Christian Arnold, James W. MacDonald, Marcel Ramos +2 more

#
Hi, I wanted to reach out for some thoughts on the following problem I
am facing with a package I recently submitted to Bioc. In essence, I am
struggling with the 15 minutes time limit for R CMD check as well as the
package size limit of 5 MB. The latter is more important, so let's focus
on that:

It is a quite large package with many functions, a full workflow for
building gene-regulatory networks, and we want to include a detailed
workflow vignette where the most important output plots are shown and
explained, to make it user-friendly and easy to apply.

For various plot* functions produce PDFs that have many pages (sometimes
dozens or even hundreds), only some of which should be shown in the
vignette (say page 2 and 5 from PDF A, and page 1 and 2 from PDF B,
etc). Including selected pages from a PDF doesnt seem to be possible
with BiocStyle (please correct me if I am wrong), so currently, I am
automatically converting each page of each of the various PDFs as a png
image, to include selected pages then in the Vignette via
knitr::include_graphics. This works well, but leads to the repo being
too big (currently 11 MB) when being build - because the original images
as well as the resulting htmls in the inst folder contain the images,
making it bigger than 5 MB. I could reduce the resolution of the images
much further, but this feels wrong also. In total, we talk about 40 or
so images that I wanted to share across the different vignettes.

Are there any thoughts on how I can proceed here without spending a lot
of time on re-designing the package logic (which I unfortunately dont
have at this point) and without sacrificing the usability of the package
(I could just remove the Workflow vignette or host it externally I guess)?


Thanks, your input is very appreciated.


Best

Christian
#
If the pages from the PDF are essentially static (for your vignette, that is), why not run it once, get the pngs, save them somewhere, and use eval = FALSE in the knitr headers for the plot* fuctions. Then you will speed things up, there won't be all this extra PDF documentation that's +/- not part of the vignette, and it should run much faster.

-----Original Message-----
From: Bioc-devel <bioc-devel-bounces at r-project.org> On Behalf Of Christian Arnold
Sent: Tuesday, March 22, 2022 3:33 PM
To: bioc-devel at r-project.org
Subject: [Bioc-devel] Vignettes with many output graphics - How to fulfill the Bioc build requirements, best practises?

Hi, I wanted to reach out for some thoughts on the following problem I am facing with a package I recently submitted to Bioc. In essence, I am struggling with the 15 minutes time limit for R CMD check as well as the package size limit of 5 MB. The latter is more important, so let's focus on that:

It is a quite large package with many functions, a full workflow for building gene-regulatory networks, and we want to include a detailed workflow vignette where the most important output plots are shown and explained, to make it user-friendly and easy to apply.

For various plot* functions produce PDFs that have many pages (sometimes dozens or even hundreds), only some of which should be shown in the vignette (say page 2 and 5 from PDF A, and page 1 and 2 from PDF B, etc). Including selected pages from a PDF doesnt seem to be possible with BiocStyle (please correct me if I am wrong), so currently, I am automatically converting each page of each of the various PDFs as a png image, to include selected pages then in the Vignette via knitr::include_graphics. This works well, but leads to the repo being too big (currently 11 MB) when being build - because the original images as well as the resulting htmls in the inst folder contain the images, making it bigger than 5 MB. I could reduce the resolution of the images much further, but this feels wrong also. In total, we talk about 40 or so images that I wanted to share across the different vignettes.

Are there any thoughts on how I can proceed here without spending a lot of time on re-designing the package logic (which I unfortunately dont have at this point) and without sacrificing the usability of the package (I could just remove the Workflow vignette or host it externally I guess)?


Thanks, your input is very appreciated.


Best

Christian

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
#
Hi Christian,

Thanks for reaching out.

 From what I gather, perhaps a workflow package submission is more 
appropriate for the
details that you would like to submit within the vignette. We recommend 
vignettes that have
small and run-able examples (possibly from simulated data) that 
demonstrate package functionality.

I haven't taken a look at the 'mysterious' package but perhaps consider 
breaking up the functionality
into separate packages, if possible. For example, you could have one 
package for each of the facilities
(e.g., stats, viz, utils, etc).

As for producing PDFs from plotting functions, this is generally 
discouraged.
Plotting functions should work like plot(1:10) and should output a 
single plot (or grouped plots)
to the graphics device. The user should then be free to choose the file 
format for any plot produced.
This approach may in turn resolve the issues you describe with plots in 
the vignette.

It may require some time re-designing the package(s) but I think your 
users would benefit
in the long run.

Best regards,

Marcel
On 3/22/22 4:05 PM, James W. MacDonald wrote:
---
Marcel Ramos
Bioconductor Core Team
Roswell Park Comprehensive Cancer Center
Dept. of Biostatistics & Bioinformatics
Elm St. & Carlton St.
Buffalo, New York 14263


This email message may contain legally privileged and/or...{{dropped:4}}
#
Yes, moving the heavy vignette to a workflow package is probably a good 
idea. See this post from last month for more info about workflow packages:

https://stat.ethz.ch/pipermail/bioc-devel/2022-February/018821.html

Cheers,

H.
On 22/03/2022 13:09, Marcel Ramos wrote:

  
    
#
Another alternative to this is the following hack. If you function foo()
produces "tons of output" is it possible to add an argument like
  foo(data, plot_number = 4)
to selectively plot something. This argument would not really be intended
for end-users, and it would have a default of (say) plot_number = NULL
meaning all pages are produced. This was you can selectively plot things
for you vignette, and this may not be too hard to add, for example if your
many plots are being produced by a for loop or similar.

Best,
Kaspr

On Tue, Mar 22, 2022 at 4:37 PM Herv? Pag?s <hpages.on.github at gmail.com>
wrote: