Skip to content
Prev 14444 / 21312 Next

[Bioc-devel] Controlling vignette compilation order

@Michael In this case, the resource produced by vignette X is a SingleCellExperiment object containing the results of various processing steps (normalization, clustering, etc.) described in that vignette. 

I can imagine a lazy evaluation model for this, but it wouldn?t be pretty. If I had another vignette Y that depended on the SCE produced by vignette X, I would need Y to execute all of the steps in X if X hadn?t already been run before Y. This gets us into the territory of Makefile-like dependencies, which seems even more complicated than simply specifying a compilation order.

You might ask why X and Y are split into two separate vignettes. The use of different vignettes is motivated by the complexity of the workflows:

- Vignette 1 demonstrates core processing steps for one read-based single-cell RNAseq dataset. 
- Vignette 2 demonstrates (slightly different) core steps for a UMI-based dataset.
- ? so on for a bunch of other core steps for different types of data.
- Vignette 6 demonstrates extra optional steps for the two SCEs produced by vignettes 1 & 3.
- ? and so on for a bunch of other optional steps.

The separation between core and optional steps into separate documents is desirable. From a pedagogical perspective, I would very much like to get the reader through all the core steps before even considering the extra steps, which would just be confusing if presented so early on. Previously, everything was in a single document, which was difficult to read (for users) and to debug (for me), especially because I had to use contrived variable names to avoid clashes between different sections of the workflow that did similar things.

@Martin I?ve been using BiocFileCache for all of the online resources that are used in the workflow. However, this is only for my (and the reader?s) convenience. I use a local cache rather than the system default, to ensure that the downloaded files are removed after package build. This is intentional as it forces the package builder to try to re-download resources when compiling the vignette, thus ensuring the validity of the URLs. For a similar reason, I would prefer not to cache the result objects for use in different R sessions. I could imagine caching the result objects for use by a different vignette in the same build session, but this gets back to the problem of ensuring that the result object is generated by one vignette before it is needed by another vignette.

-A