Skip to content

[Bioc-devel] Issues with package size

9 messages · Shepherd, Lori, Giulia Pais, Vincent Carey +2 more

#
Hello, I?m the developer of the package ISAnalytics.
I?d like to ask if it is possible to have a file/vignette that links to other documentation outside the package (like a GitHub wiki) since we have issues with the maximum allowed package size due to vignette size, We would like to maintain as much documentation as possible and we already tried to reduce data included but it?s not sufficient.
Thanks in advance
Giulia Pais
#
Hi Giulia,

I think it is ok to host the vignettes somewhere else. I have two packages
of which the vignettes are hosted on GitHub Page.

http://www.bioconductor.org/packages/devel/bioc/html/ComplexHeatmap.html
https://www.bioconductor.org/packages/devel/bioc/html/cola.html

But since now the vignettes are not automatically checked, you need to make
sure every time you update your package, the vignettes can be successfully
generated.

Cheers,
Zuguang
On Thu, 23 Sept 2021 at 14:07, Giulia Pais <giuliapais1 at gmail.com> wrote:

            

  
  
#
We don't encourage hosting a vignette someplace else as then the code is not automatically checked.  Is the large size do to data or images?  Could you provide some additional detail please?


Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263
#
In generally, Bioconductor will always require at least one vignette automatically checked and that performs non trivial calls to package code.


Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263
#
Hi Lori,
It is mainly data and a couple of plots, no images. Even if there is very minimal data somehow we can?t obtain vignettes smaller than 1 MB.


From: "Kern, Lori" <Lori.Shepherd at RoswellPark.org>
Date: Thursday, September 23, 2021 at 14:52
To: Zuguang Gu <jokergoo at gmail.com>, Giulia Pais <giuliapais1 at gmail.com>
Cc: "bioc-devel at r-project.org" <bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] Issues with package size

In generally, Bioconductor will always require at least one vignette automatically checked and that performs non trivial calls to package code.


Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263
#
It seems to me the reason for the large vignette size is that you have
significant DT::datatable rendering
capabilities within your vignettes, and the tables are large.  I would
propose that you use subsets of the
table content for the vignettes, and direct the reader who wants to search
the entire table interactively, to
run a function in the package that will produce the full table in datatable
format.
On Thu, Sep 23, 2021 at 8:56 AM Giulia Pais <giuliapais1 at gmail.com> wrote:

            

  
    
#
Tried removing completely all DT calls but we?re still slightly above the limit. Is there no other way other than removing vignettes or files?

From: Vincent Carey <stvjc at channing.harvard.edu>
Date: Thursday, September 23, 2021 at 15:51
To: Giulia Pais <giuliapais1 at gmail.com>
Cc: "Kern, Lori" <Lori.Shepherd at roswellpark.org>, Zuguang Gu <jokergoo at gmail.com>, "bioc-devel at r-project.org" <bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] Issues with package size

It seems to me the reason for the large vignette size is that you have significant DT::datatable rendering
capabilities within your vignettes, and the tables are large.  I would propose that you use subsets of the
table content for the vignettes, and direct the reader who wants to search the entire table interactively, to
run a function in the package that will produce the full table in datatable format.
On Thu, Sep 23, 2021 at 8:56 AM Giulia Pais <giuliapais1 at gmail.com<mailto:giuliapais1 at gmail.com>> wrote:
Hi Lori,
It is mainly data and a couple of plots, no images. Even if there is very minimal data somehow we can?t obtain vignettes smaller than 1 MB.


From: "Kern, Lori" <Lori.Shepherd at RoswellPark.org>
Date: Thursday, September 23, 2021 at 14:52
To: Zuguang Gu <jokergoo at gmail.com<mailto:jokergoo at gmail.com>>, Giulia Pais <giuliapais1 at gmail.com<mailto:giuliapais1 at gmail.com>>
Cc: "bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>" <bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>>
Subject: Re: [Bioc-devel] Issues with package size

In generally, Bioconductor will always require at least one vignette automatically checked and that performs non trivial calls to package code.


Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263
#
Hi Zuguang,
On 23/09/2021 05:45, Zuguang Gu wrote:
Unfortunately this is something we **strongly** discourage.



It's important to understand that your vignettes can break any time 
(e.g. when something they depend on changes), not just when you update 
your package. This is why Bioconductor vignettes should always be 
located in the vignettes/ folder of the package, and be "real" 
vignettes, that is, they must contain code chunks that get evaluated by 
'R CMD check'.


Best,

H.

  
    
#
Hi Herv?, Hi all,

Yes, I totally understand and agree with the standard of Bioc packages
development. I always try to follow it as best as I can. But regarding the
"real rerunnable vignettes", I think there are several scenarios that make
it really difficult to follow this standard way:

1. Some examples need a long time to run or depend on a very large dataset,
and it is impossible to use a reduced small subset of data. E.g. in my
HilbertCurve package, there are several examples visualizing the complete
chromosome 21 (around 15 examples, which take almost 30-45 min to run).
Because this package makes a "global view of a genome", it makes no sense
to only work on a small subset of genomic regions. My solution is on my
local machine, the code chunks that generate these plots are actually
evaluated and they also generate cached figures, while on the Bioc server,
the code chunks are not evaluated and the cached figures are directly used.
I guess packages for single-cell RNAseq analysis might also have this issue
that the analysis makes sense only with more than thousands of cells (just
guess, I don't have experience with scRNAseq data analysis).

2. Some vignettes generate or include many figures (static or gif) which
results in the final file size of the package being very huge (tens of MBs
or maybe more), especially for packages focused on data visualization. A
good vignette should contain lots of example figures which illustrate the
various usages of the package for users. E.g. my ComplexHeatmap package
contains hundreds of figures, thus I decided to host the vignette somewhere
else.

I can think of some solutions for this:

1. include a small and evaluable vignette that only contains the "core
analysis" or "the most used features" in the package, while hosting more
comprehensive vignettes somewhere else.

2. add extensive tests in the package to ensure the reliability of the
package. For example, in ComplexHeatmap, although there is almost no
runnable vignette, it actually includes hundreds of tests that will be
evaluated during `R CMD check`.

Best regards,
Zuguang


On Thu, 23 Sept 2021 at 20:49, Herv? Pag?s <hpages.on.github at gmail.com>
wrote: