Hi, I wanted to reach out for some thoughts on the following problem I am facing with a package I recently submitted to Bioc. In essence, I am struggling with the 15 minutes time limit for R CMD check as well as the package size limit of 5 MB. The latter is more important, so let's focus on that: It is a quite large package with many functions, a full workflow for building gene-regulatory networks, and we want to include a detailed workflow vignette where the most important output plots are shown and explained, to make it user-friendly and easy to apply. For various plot* functions produce PDFs that have many pages (sometimes dozens or even hundreds), only some of which should be shown in the vignette (say page 2 and 5 from PDF A, and page 1 and 2 from PDF B, etc). Including selected pages from a PDF doesnt seem to be possible with BiocStyle (please correct me if I am wrong), so currently, I am automatically converting each page of each of the various PDFs as a png image, to include selected pages then in the Vignette via knitr::include_graphics. This works well, but leads to the repo being too big (currently 11 MB) when being build - because the original images as well as the resulting htmls in the inst folder contain the images, making it bigger than 5 MB. I could reduce the resolution of the images much further, but this feels wrong also. In total, we talk about 40 or so images that I wanted to share across the different vignettes. Are there any thoughts on how I can proceed here without spending a lot of time on re-designing the package logic (which I unfortunately dont have at this point) and without sacrificing the usability of the package (I could just remove the Workflow vignette or host it externally I guess)? Thanks, your input is very appreciated. Best Christian
[Bioc-devel] Vignettes with many output graphics - How to fulfill the Bioc build requirements, best practises?
5 messages · Christian Arnold, James W. MacDonald, Marcel Ramos +2 more
If the pages from the PDF are essentially static (for your vignette, that is), why not run it once, get the pngs, save them somewhere, and use eval = FALSE in the knitr headers for the plot* fuctions. Then you will speed things up, there won't be all this extra PDF documentation that's +/- not part of the vignette, and it should run much faster. -----Original Message----- From: Bioc-devel <bioc-devel-bounces at r-project.org> On Behalf Of Christian Arnold Sent: Tuesday, March 22, 2022 3:33 PM To: bioc-devel at r-project.org Subject: [Bioc-devel] Vignettes with many output graphics - How to fulfill the Bioc build requirements, best practises? Hi, I wanted to reach out for some thoughts on the following problem I am facing with a package I recently submitted to Bioc. In essence, I am struggling with the 15 minutes time limit for R CMD check as well as the package size limit of 5 MB. The latter is more important, so let's focus on that: It is a quite large package with many functions, a full workflow for building gene-regulatory networks, and we want to include a detailed workflow vignette where the most important output plots are shown and explained, to make it user-friendly and easy to apply. For various plot* functions produce PDFs that have many pages (sometimes dozens or even hundreds), only some of which should be shown in the vignette (say page 2 and 5 from PDF A, and page 1 and 2 from PDF B, etc). Including selected pages from a PDF doesnt seem to be possible with BiocStyle (please correct me if I am wrong), so currently, I am automatically converting each page of each of the various PDFs as a png image, to include selected pages then in the Vignette via knitr::include_graphics. This works well, but leads to the repo being too big (currently 11 MB) when being build - because the original images as well as the resulting htmls in the inst folder contain the images, making it bigger than 5 MB. I could reduce the resolution of the images much further, but this feels wrong also. In total, we talk about 40 or so images that I wanted to share across the different vignettes. Are there any thoughts on how I can proceed here without spending a lot of time on re-designing the package logic (which I unfortunately dont have at this point) and without sacrificing the usability of the package (I could just remove the Workflow vignette or host it externally I guess)? Thanks, your input is very appreciated. Best Christian _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hi Christian, Thanks for reaching out. From what I gather, perhaps a workflow package submission is more appropriate for the details that you would like to submit within the vignette. We recommend vignettes that have small and run-able examples (possibly from simulated data) that demonstrate package functionality. I haven't taken a look at the 'mysterious' package but perhaps consider breaking up the functionality into separate packages, if possible. For example, you could have one package for each of the facilities (e.g., stats, viz, utils, etc). As for producing PDFs from plotting functions, this is generally discouraged. Plotting functions should work like plot(1:10) and should output a single plot (or grouped plots) to the graphics device. The user should then be free to choose the file format for any plot produced. This approach may in turn resolve the issues you describe with plots in the vignette. It may require some time re-designing the package(s) but I think your users would benefit in the long run. Best regards, Marcel
On 3/22/22 4:05 PM, James W. MacDonald wrote:
If the pages from the PDF are essentially static (for your vignette, that is), why not run it once, get the pngs, save them somewhere, and use eval = FALSE in the knitr headers for the plot* fuctions. Then you will speed things up, there won't be all this extra PDF documentation that's +/- not part of the vignette, and it should run much faster. -----Original Message----- From: Bioc-devel<bioc-devel-bounces at r-project.org> On Behalf Of Christian Arnold Sent: Tuesday, March 22, 2022 3:33 PM To:bioc-devel at r-project.org Subject: [Bioc-devel] Vignettes with many output graphics - How to fulfill the Bioc build requirements, best practises? Hi, I wanted to reach out for some thoughts on the following problem I am facing with a package I recently submitted to Bioc. In essence, I am struggling with the 15 minutes time limit for R CMD check as well as the package size limit of 5 MB. The latter is more important, so let's focus on that: It is a quite large package with many functions, a full workflow for building gene-regulatory networks, and we want to include a detailed workflow vignette where the most important output plots are shown and explained, to make it user-friendly and easy to apply. For various plot* functions produce PDFs that have many pages (sometimes dozens or even hundreds), only some of which should be shown in the vignette (say page 2 and 5 from PDF A, and page 1 and 2 from PDF B, etc). Including selected pages from a PDF doesnt seem to be possible with BiocStyle (please correct me if I am wrong), so currently, I am automatically converting each page of each of the various PDFs as a png image, to include selected pages then in the Vignette via knitr::include_graphics. This works well, but leads to the repo being too big (currently 11 MB) when being build - because the original images as well as the resulting htmls in the inst folder contain the images, making it bigger than 5 MB. I could reduce the resolution of the images much further, but this feels wrong also. In total, we talk about 40 or so images that I wanted to share across the different vignettes. Are there any thoughts on how I can proceed here without spending a lot of time on re-designing the package logic (which I unfortunately dont have at this point) and without sacrificing the usability of the package (I could just remove the Workflow vignette or host it externally I guess)? Thanks, your input is very appreciated. Best Christian
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
---
Marcel Ramos
Bioconductor Core Team
Roswell Park Comprehensive Cancer Center
Dept. of Biostatistics & Bioinformatics
Elm St. & Carlton St.
Buffalo, New York 14263
This email message may contain legally privileged and/or...{{dropped:4}}
Yes, moving the heavy vignette to a workflow package is probably a good idea. See this post from last month for more info about workflow packages: https://stat.ethz.ch/pipermail/bioc-devel/2022-February/018821.html Cheers, H.
On 22/03/2022 13:09, Marcel Ramos wrote:
Hi Christian, Thanks for reaching out. From what I gather, perhaps a workflow package submission is more appropriate for the details that you would like to submit within the vignette. We recommend vignettes that have small and run-able examples (possibly from simulated data) that demonstrate package functionality. I haven't taken a look at the 'mysterious' package but perhaps consider breaking up the functionality into separate packages, if possible. For example, you could have one package for each of the facilities (e.g., stats, viz, utils, etc). As for producing PDFs from plotting functions, this is generally discouraged. Plotting functions should work like plot(1:10) and should output a single plot (or grouped plots) to the graphics device. The user should then be free to choose the file format for any plot produced. This approach may in turn resolve the issues you describe with plots in the vignette. It may require some time re-designing the package(s) but I think your users would benefit in the long run. Best regards, Marcel On 3/22/22 4:05 PM, James W. MacDonald wrote:
If the pages from the PDF are essentially static (for your vignette, that is), why not run it once, get the pngs, save them somewhere, and use eval = FALSE in the knitr headers for the plot* fuctions. Then you will speed things up, there won't be all this extra PDF documentation that's +/- not part of the vignette, and it should run much faster. -----Original Message----- From: Bioc-devel<bioc-devel-bounces at r-project.org> On Behalf Of Christian Arnold Sent: Tuesday, March 22, 2022 3:33 PM To:bioc-devel at r-project.org Subject: [Bioc-devel] Vignettes with many output graphics - How to fulfill the Bioc build requirements, best practises? Hi, I wanted to reach out for some thoughts on the following problem I am facing with a package I recently submitted to Bioc. In essence, I am struggling with the 15 minutes time limit for R CMD check as well as the package size limit of 5 MB. The latter is more important, so let's focus on that: It is a quite large package with many functions, a full workflow for building gene-regulatory networks, and we want to include a detailed workflow vignette where the most important output plots are shown and explained, to make it user-friendly and easy to apply. For various plot* functions produce PDFs that have many pages (sometimes dozens or even hundreds), only some of which should be shown in the vignette (say page 2 and 5 from PDF A, and page 1 and 2 from PDF B, etc). Including selected pages from a PDF doesnt seem to be possible with BiocStyle (please correct me if I am wrong), so currently, I am automatically converting each page of each of the various PDFs as a png image, to include selected pages then in the Vignette via knitr::include_graphics. This works well, but leads to the repo being too big (currently 11 MB) when being build - because the original images as well as the resulting htmls in the inst folder contain the images, making it bigger than 5 MB. I could reduce the resolution of the images much further, but this feels wrong also. In total, we talk about 40 or so images that I wanted to share across the different vignettes. Are there any thoughts on how I can proceed here without spending a lot of time on re-designing the package logic (which I unfortunately dont have at this point) and without sacrificing the usability of the package (I could just remove the Workflow vignette or host it externally I guess)? Thanks, your input is very appreciated. Best Christian
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
---
Marcel Ramos
Bioconductor Core Team
Roswell Park Comprehensive Cancer Center
Dept. of Biostatistics & Bioinformatics
Elm St. & Carlton St.
Buffalo, New York 14263
This email message may contain legally privileged and/or...{{dropped:4}}
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Bioconductor Core Team hpages.on.github at gmail.com
Another alternative to this is the following hack. If you function foo() produces "tons of output" is it possible to add an argument like foo(data, plot_number = 4) to selectively plot something. This argument would not really be intended for end-users, and it would have a default of (say) plot_number = NULL meaning all pages are produced. This was you can selectively plot things for you vignette, and this may not be too hard to add, for example if your many plots are being produced by a for loop or similar. Best, Kaspr On Tue, Mar 22, 2022 at 4:37 PM Herv? Pag?s <hpages.on.github at gmail.com> wrote:
Yes, moving the heavy vignette to a workflow package is probably a good idea. See this post from last month for more info about workflow packages: https://stat.ethz.ch/pipermail/bioc-devel/2022-February/018821.html Cheers, H. On 22/03/2022 13:09, Marcel Ramos wrote:
Hi Christian, Thanks for reaching out. From what I gather, perhaps a workflow package submission is more appropriate for the details that you would like to submit within the vignette. We recommend vignettes that have small and run-able examples (possibly from simulated data) that demonstrate package functionality. I haven't taken a look at the 'mysterious' package but perhaps consider breaking up the functionality into separate packages, if possible. For example, you could have one package for each of the facilities (e.g., stats, viz, utils, etc). As for producing PDFs from plotting functions, this is generally discouraged. Plotting functions should work like plot(1:10) and should output a single plot (or grouped plots) to the graphics device. The user should then be free to choose the file format for any plot produced. This approach may in turn resolve the issues you describe with plots in the vignette. It may require some time re-designing the package(s) but I think your users would benefit in the long run. Best regards, Marcel On 3/22/22 4:05 PM, James W. MacDonald wrote:
If the pages from the PDF are essentially static (for your vignette,
that is), why not run it once, get the pngs, save them somewhere, and use eval = FALSE in the knitr headers for the plot* fuctions. Then you will speed things up, there won't be all this extra PDF documentation that's +/- not part of the vignette, and it should run much faster.
-----Original Message----- From: Bioc-devel<bioc-devel-bounces at r-project.org> On Behalf Of
Christian Arnold
Sent: Tuesday, March 22, 2022 3:33 PM To:bioc-devel at r-project.org Subject: [Bioc-devel] Vignettes with many output graphics - How to
fulfill the Bioc build requirements, best practises?
Hi, I wanted to reach out for some thoughts on the following problem I
am facing with a package I recently submitted to Bioc. In essence, I am struggling with the 15 minutes time limit for R CMD check as well as the package size limit of 5 MB. The latter is more important, so let's focus on that:
It is a quite large package with many functions, a full workflow for
building gene-regulatory networks, and we want to include a detailed workflow vignette where the most important output plots are shown and explained, to make it user-friendly and easy to apply.
For various plot* functions produce PDFs that have many pages
(sometimes dozens or even hundreds), only some of which should be shown in the vignette (say page 2 and 5 from PDF A, and page 1 and 2 from PDF B, etc). Including selected pages from a PDF doesnt seem to be possible with BiocStyle (please correct me if I am wrong), so currently, I am automatically converting each page of each of the various PDFs as a png image, to include selected pages then in the Vignette via knitr::include_graphics. This works well, but leads to the repo being too big (currently 11 MB) when being build - because the original images as well as the resulting htmls in the inst folder contain the images, making it bigger than 5 MB. I could reduce the resolution of the images much further, but this feels wrong also. In total, we talk about 40 or so images that I wanted to share across the different vignettes.
Are there any thoughts on how I can proceed here without spending a lot
of time on re-designing the package logic (which I unfortunately dont have at this point) and without sacrificing the usability of the package (I could just remove the Workflow vignette or host it externally I guess)?
Thanks, your input is very appreciated. Best Christian
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
---
Marcel Ramos
Bioconductor Core Team
Roswell Park Comprehensive Cancer Center
Dept. of Biostatistics & Bioinformatics
Elm St. & Carlton St.
Buffalo, New York 14263
This email message may contain legally privileged and/or...{{dropped:4}}
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Herv? Pag?s Bioconductor Core Team hpages.on.github at gmail.com
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Best, Kasper [[alternative HTML version deleted]]