[Bioc-devel] Controlling vignette compilation order - Bioc-devel

Tue, Dec 18, 2018 5:22 AM #

In a number of my workflow packages (e.g., simpleSingleCell), I rely on a specific compilation order for my vignettes. This is because some vignettes set up resources or objects that are to be used by later vignettes.

Michael Lawrence

Tue, Dec 18, 2018 5:34 AM #

I would recommend against dependencies across vignettes. Ideally someone
can pick up a vignette and execute the code independently of any other
documentation. Perhaps you could move the code generating those shared
resources to the package. They could behave lazily, only generating the
resource if necessary, otherwise reusing it. That would also make it easy
for people to write their own documents using those resources.

Michael

On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun <

infinite.monkeys.with.keyboards at gmail.com> wrote:

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Martin Morgan

Tue, Dec 18, 2018 6:14 AM #

Also perhaps using BiocFileCache so that the result object is only generated once, then cached for future (different session) use.

?On 12/18/18, 8:35 AM, "Bioc-devel on behalf of Michael Lawrence" <bioc-devel-bounces at r-project.org on behalf of lawrence.michael at gene.com> wrote:

    I would recommend against dependencies across vignettes. Ideally someone
    can pick up a vignette and execute the code independently of any other
    documentation. Perhaps you could move the code generating those shared
    resources to the package. They could behave lazily, only generating the
    resource if necessary, otherwise reusing it. That would also make it easy
    for people to write their own documents using those resources.
    
    Michael
    
    On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun <

infinite.monkeys.with.keyboards at gmail.com> wrote:

> In a number of my workflow packages (e.g., simpleSingleCell), I rely on a
    > specific compilation order for my vignettes. This is because some vignettes
    > set up resources or objects that are to be used by later vignettes.
    >
    > From what I understand, vignettes are compiled in alphanumeric ordering of
    > their file names. As such, I give my vignettes fairly structured names,
    > e.g., ?work-1-reads.Rmd?, ?work-2-umi.Rmd? and so on.
    >
    > However, it becomes rather annoying when I want to add a new vignette in
    > the middle somewhere. This results in some unnatural numberings, e.g.,
    > ?work-0?, ?3b?, which are ugly and unintuitive. This is relevant as
    > BiocStyle::Biocpkg() links between vignettes require you to use the
    > destination vignette?s file name; so difficult names complicate linking,
    > especially if the names continually change to reflect new orderings.
    >
    > Is there an easier way to control vignette compilation order? WRE provides
    > no (obvious) guidance, so I would like to know what non-standard hacks are
    > known to work on the build machines. I can imagine something dirty whereby
    > one ?reference? vignette contains code to ?rmarkdown::render" all other
    > vignettes in the specified order? ugh.
    >
    > -A
    >
    > _______________________________________________
    > Bioc-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/bioc-devel
    >
    >
    
    
    _______________________________________________
    Bioc-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel

Aaron Lun

Tue, Dec 18, 2018 6:58 AM #

@Michael In this case, the resource produced by vignette X is a SingleCellExperiment object containing the results of various processing steps (normalization, clustering, etc.) described in that vignette.

I can imagine a lazy evaluation model for this, but it wouldn?t be pretty. If I had another vignette Y that depended on the SCE produced by vignette X, I would need Y to execute all of the steps in X if X hadn?t already been run before Y. This gets us into the territory of Makefile-like dependencies, which seems even more complicated than simply specifying a compilation order.

You might ask why X and Y are split into two separate vignettes. The use of different vignettes is motivated by the complexity of the workflows:

- Vignette 1 demonstrates core processing steps for one read-based single-cell RNAseq dataset.
- Vignette 2 demonstrates (slightly different) core steps for a UMI-based dataset.
- ? so on for a bunch of other core steps for different types of data.
- Vignette 6 demonstrates extra optional steps for the two SCEs produced by vignettes 1 & 3.
- ? and so on for a bunch of other optional steps.

The separation between core and optional steps into separate documents is desirable. From a pedagogical perspective, I would very much like to get the reader through all the core steps before even considering the extra steps, which would just be confusing if presented so early on. Previously, everything was in a single document, which was difficult to read (for users) and to debug (for me), especially because I had to use contrived variable names to avoid clashes between different sections of the workflow that did similar things.

@Martin I?ve been using BiocFileCache for all of the online resources that are used in the workflow. However, this is only for my (and the reader?s) convenience. I use a local cache rather than the system default, to ensure that the downloaded files are removed after package build. This is intentional as it forces the package builder to try to re-download resources when compiling the vignette, thus ensuring the validity of the URLs. For a similar reason, I would prefer not to cache the result objects for use in different R sessions. I could imagine caching the result objects for use by a different vignette in the same build session, but this gets back to the problem of ensuring that the result object is generated by one vignette before it is needed by another vignette.

-A

On 18 Dec 2018, at 14:14, Martin Morgan <mtmorgan.bioc at gmail.com> wrote:

Also perhaps using BiocFileCache so that the result object is only generated once, then cached for future (different session) use.

?On 12/18/18, 8:35 AM, "Bioc-devel on behalf of Michael Lawrence" <bioc-devel-bounces at r-project.org on behalf of lawrence.michael at gene.com> wrote:

   I would recommend against dependencies across vignettes. Ideally someone
   can pick up a vignette and execute the code independently of any other
   documentation. Perhaps you could move the code generating those shared
   resources to the package. They could behave lazily, only generating the
   resource if necessary, otherwise reusing it. That would also make it easy
   for people to write their own documents using those resources.

   Michael

   On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun <
   infinite.monkeys.with.keyboards at gmail.com> wrote:

In a number of my workflow packages (e.g., simpleSingleCell), I rely on a
specific compilation order for my vignettes. This is because some vignettes
set up resources or objects that are to be used by later vignettes.

From what I understand, vignettes are compiled in alphanumeric ordering of
their file names. As such, I give my vignettes fairly structured names,
e.g., ?work-1-reads.Rmd?, ?work-2-umi.Rmd? and so on.

However, it becomes rather annoying when I want to add a new vignette in
the middle somewhere. This results in some unnatural numberings, e.g.,
?work-0?, ?3b?, which are ugly and unintuitive. This is relevant as
BiocStyle::Biocpkg() links between vignettes require you to use the
destination vignette?s file name; so difficult names complicate linking,
especially if the names continually change to reflect new orderings.

Is there an easier way to control vignette compilation order? WRE provides
no (obvious) guidance, so I would like to know what non-standard hacks are
known to work on the build machines. I can imagine something dirty whereby
one ?reference? vignette contains code to ?rmarkdown::render" all other
vignettes in the specified order? ugh.

-A

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

   _______________________________________________
   Bioc-devel at r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel

Hervé Pagès

Tue, Dec 18, 2018 8:51 AM #

Hi Aaron,

Right now 'R CMD build' evaluates all vignettes in the same R session. 
Personally I see this as an undesirable feature and hope that it will 
change in the future. Problem with this is that when a vignette hits the 
max DLL limit, breaking it down into smaller vignettes doesn't help. 
Another problem is that sometimes using 'R CMD Stangle && source()' does 
not reproduce a bug triggered by 'R CMD build'. I can spend a lot of 
time scratching my head on this until I finally realize that I first 
have to evaluate one of the other vignettes in order to reproduce the bug.

On this note I wish 'R CMD build' would show progress by printing the 
name of the vignettes it's currently evaluating (like 'R CMD check' does 
during the 'checking running R code from vignettes' step). Should be an 
easy improvement and it would already help a lot.

That being said I'm also sympathetic to your use case where sometimes a 
big monolithic vignette needs to be broken down into smaller units. I 
don't know of any way to control the order of evaluation other than 
using a Makefile for that though.

H.

On 12/18/18 06:58, Aaron Lun wrote:

@Michael In this case, the resource produced by vignette X is a SingleCellExperiment object containing the results of various processing steps (normalization, clustering, etc.) described in that vignette.

I can imagine a lazy evaluation model for this, but it wouldn?t be pretty. If I had another vignette Y that depended on the SCE produced by vignette X, I would need Y to execute all of the steps in X if X hadn?t already been run before Y. This gets us into the territory of Makefile-like dependencies, which seems even more complicated than simply specifying a compilation order.

You might ask why X and Y are split into two separate vignettes. The use of different vignettes is motivated by the complexity of the workflows:

- Vignette 1 demonstrates core processing steps for one read-based single-cell RNAseq dataset.
- Vignette 2 demonstrates (slightly different) core steps for a UMI-based dataset.
- ? so on for a bunch of other core steps for different types of data.
- Vignette 6 demonstrates extra optional steps for the two SCEs produced by vignettes 1 & 3.
- ? and so on for a bunch of other optional steps.

The separation between core and optional steps into separate documents is desirable. From a pedagogical perspective, I would very much like to get the reader through all the core steps before even considering the extra steps, which would just be confusing if presented so early on. Previously, everything was in a single document, which was difficult to read (for users) and to debug (for me), especially because I had to use contrived variable names to avoid clashes between different sections of the workflow that did similar things.

@Martin I?ve been using BiocFileCache for all of the online resources that are used in the workflow. However, this is only for my (and the reader?s) convenience. I use a local cache rather than the system default, to ensure that the downloaded files are removed after package build. This is intentional as it forces the package builder to try to re-download resources when compiling the vignette, thus ensuring the validity of the URLs. For a similar reason, I would prefer not to cache the result objects for use in different R sessions. I could imagine caching the result objects for use by a different vignette in the same build session, but this gets back to the problem of ensuring that the result object is generated by one vignette before it is needed by another vignette.

-A

On 18 Dec 2018, at 14:14, Martin Morgan <mtmorgan.bioc at gmail.com> wrote:

Also perhaps using BiocFileCache so that the result object is only generated once, then cached for future (different session) use.

?On 12/18/18, 8:35 AM, "Bioc-devel on behalf of Michael Lawrence" <bioc-devel-bounces at r-project.org on behalf of lawrence.michael at gene.com> wrote:

    I would recommend against dependencies across vignettes. Ideally someone
    can pick up a vignette and execute the code independently of any other
    documentation. Perhaps you could move the code generating those shared
    resources to the package. They could behave lazily, only generating the
    resource if necessary, otherwise reusing it. That would also make it easy
    for people to write their own documents using those resources.

    Michael

    On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun <
    infinite.monkeys.with.keyboards at gmail.com> wrote:

In a number of my workflow packages (e.g., simpleSingleCell), I rely on a
specific compilation order for my vignettes. This is because some vignettes
set up resources or objects that are to be used by later vignettes.

 From what I understand, vignettes are compiled in alphanumeric ordering of
their file names. As such, I give my vignettes fairly structured names,
e.g., ?work-1-reads.Rmd?, ?work-2-umi.Rmd? and so on.

However, it becomes rather annoying when I want to add a new vignette in
the middle somewhere. This results in some unnatural numberings, e.g.,
?work-0?, ?3b?, which are ugly and unintuitive. This is relevant as
BiocStyle::Biocpkg() links between vignettes require you to use the
destination vignette?s file name; so difficult names complicate linking,
especially if the names continually change to reflect new orderings.

Is there an easier way to control vignette compilation order? WRE provides
no (obvious) guidance, so I would like to know what non-standard hacks are
known to work on the build machines. I can imagine something dirty whereby
one ?reference? vignette contains code to ?rmarkdown::render" all other
vignettes in the specified order? ugh.

-A

_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=HK4GXFTI9jQmnGIwuZCIG3W6Mv_gfilqE0XppSWaO2I&s=AFfRo761pnzXCPY6EnVmNDZZ_Qg7oN8anptEHNVL4l0&e=

    _______________________________________________
    Bioc-devel at r-project.org mailing list
    https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=HK4GXFTI9jQmnGIwuZCIG3W6Mv_gfilqE0XppSWaO2I&s=AFfRo761pnzXCPY6EnVmNDZZ_Qg7oN8anptEHNVL4l0&e=

_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=HK4GXFTI9jQmnGIwuZCIG3W6Mv_gfilqE0XppSWaO2I&s=AFfRo761pnzXCPY6EnVmNDZZ_Qg7oN8anptEHNVL4l0&e=

Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

Michael Lawrence

Tue, Dec 18, 2018 9:41 AM #

Sounds like a use case for drake...

On Tue, Dec 18, 2018 at 6:58 AM Aaron Lun <

infinite.monkeys.with.keyboards at gmail.com> wrote:

@Michael In this case, the resource produced by vignette X is a
SingleCellExperiment object containing the results of various processing
steps (normalization, clustering, etc.) described in that vignette.

I can imagine a lazy evaluation model for this, but it wouldn?t be pretty.
If I had another vignette Y that depended on the SCE produced by vignette
X, I would need Y to execute all of the steps in X if X hadn?t already been
run before Y. This gets us into the territory of Makefile-like
dependencies, which seems even more complicated than simply specifying a
compilation order.

You might ask why X and Y are split into two separate vignettes. The use
of different vignettes is motivated by the complexity of the workflows:

- Vignette 1 demonstrates core processing steps for one read-based
single-cell RNAseq dataset.
- Vignette 2 demonstrates (slightly different) core steps for a UMI-based
dataset.
- ? so on for a bunch of other core steps for different types of data.
- Vignette 6 demonstrates extra optional steps for the two SCEs produced
by vignettes 1 & 3.
- ? and so on for a bunch of other optional steps.

The separation between core and optional steps into separate documents is
desirable. From a pedagogical perspective, I would very much like to get
the reader through all the core steps before even considering the extra
steps, which would just be confusing if presented so early on. Previously,
everything was in a single document, which was difficult to read (for
users) and to debug (for me), especially because I had to use contrived
variable names to avoid clashes between different sections of the workflow
that did similar things.

@Martin I?ve been using BiocFileCache for all of the online resources that
are used in the workflow. However, this is only for my (and the reader?s)
convenience. I use a local cache rather than the system default, to ensure
that the downloaded files are removed after package build. This is
intentional as it forces the package builder to try to re-download
resources when compiling the vignette, thus ensuring the validity of the
URLs. For a similar reason, I would prefer not to cache the result objects
for use in different R sessions. I could imagine caching the result objects
for use by a different vignette in the same build session, but this gets
back to the problem of ensuring that the result object is generated by one
vignette before it is needed by another vignette.

-A

On 18 Dec 2018, at 14:14, Martin Morgan <mtmorgan.bioc at gmail.com> wrote:

Also perhaps using BiocFileCache so that the result object is only

generated once, then cached for future (different session) use.

?On 12/18/18, 8:35 AM, "Bioc-devel on behalf of Michael Lawrence" <

bioc-devel-bounces at r-project.org on behalf of lawrence.michael at gene.com>
wrote:

   I would recommend against dependencies across vignettes. Ideally

someone

   can pick up a vignette and execute the code independently of any other
   documentation. Perhaps you could move the code generating those shared
   resources to the package. They could behave lazily, only generating

the

   resource if necessary, otherwise reusing it. That would also make it

easy

   for people to write their own documents using those resources.

   Michael

   On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun <
   infinite.monkeys.with.keyboards at gmail.com> wrote:

In a number of my workflow packages (e.g., simpleSingleCell), I rely on

specific compilation order for my vignettes. This is because some

vignettes

set up resources or objects that are to be used by later vignettes.

From what I understand, vignettes are compiled in alphanumeric ordering

of

their file names. As such, I give my vignettes fairly structured names,
e.g., ?work-1-reads.Rmd?, ?work-2-umi.Rmd? and so on.

However, it becomes rather annoying when I want to add a new vignette in
the middle somewhere. This results in some unnatural numberings, e.g.,
?work-0?, ?3b?, which are ugly and unintuitive. This is relevant as
BiocStyle::Biocpkg() links between vignettes require you to use the
destination vignette?s file name; so difficult names complicate linking,
especially if the names continually change to reflect new orderings.

Is there an easier way to control vignette compilation order? WRE

provides

no (obvious) guidance, so I would like to know what non-standard hacks

are

known to work on the build machines. I can imagine something dirty

whereby

one ?reference? vignette contains code to ?rmarkdown::render" all other
vignettes in the specified order? ugh.

-A

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

   _______________________________________________
   Bioc-devel at r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Aaron Lun

Fri, Dec 21, 2018 11:26 PM #

I gave it a shot:

https://github.com/LTLA/DrakeTest <https://github.com/LTLA/DrakeTest>

This uses a single ?controller? Rmd file to trigger Drake::make. Running this file will instruct Drake to compile all of the other vignettes following the desired dependency structure.

The current sticking point is that I need to move the Drake-controlled Rmd files out of ?vignettes/?, otherwise they?ll just be compiled as usual without consideration of their dependencies. This causes problems as R CMD BUILD only recognizes the controller Rmd file as the sole vignette, and doesn?t retain or index the HTML files produced from the other Rmd files as side-effects of running the controller.

Are there any better ways to subvert the vignette building procedure to get the desired effect of running drake::make() and recognition of the resulting HTMLs as vignettes?

-A

On 18 Dec 2018, at 17:41, Michael Lawrence <lawrence.michael at gene.com> wrote:

Sounds like a use case for drake...

On Tue, Dec 18, 2018 at 6:58 AM Aaron Lun <infinite.monkeys.with.keyboards at gmail.com <mailto:infinite.monkeys.with.keyboards at gmail.com>> wrote:
@Michael In this case, the resource produced by vignette X is a SingleCellExperiment object containing the results of various processing steps (normalization, clustering, etc.) described in that vignette.

I can imagine a lazy evaluation model for this, but it wouldn?t be pretty. If I had another vignette Y that depended on the SCE produced by vignette X, I would need Y to execute all of the steps in X if X hadn?t already been run before Y. This gets us into the territory of Makefile-like dependencies, which seems even more complicated than simply specifying a compilation order.

You might ask why X and Y are split into two separate vignettes. The use of different vignettes is motivated by the complexity of the workflows:

- Vignette 1 demonstrates core processing steps for one read-based single-cell RNAseq dataset.
- Vignette 2 demonstrates (slightly different) core steps for a UMI-based dataset.
- ? so on for a bunch of other core steps for different types of data.
- Vignette 6 demonstrates extra optional steps for the two SCEs produced by vignettes 1 & 3.
- ? and so on for a bunch of other optional steps.

The separation between core and optional steps into separate documents is desirable. From a pedagogical perspective, I would very much like to get the reader through all the core steps before even considering the extra steps, which would just be confusing if presented so early on. Previously, everything was in a single document, which was difficult to read (for users) and to debug (for me), especially because I had to use contrived variable names to avoid clashes between different sections of the workflow that did similar things.

@Martin I?ve been using BiocFileCache for all of the online resources that are used in the workflow. However, this is only for my (and the reader?s) convenience. I use a local cache rather than the system default, to ensure that the downloaded files are removed after package build. This is intentional as it forces the package builder to try to re-download resources when compiling the vignette, thus ensuring the validity of the URLs. For a similar reason, I would prefer not to cache the result objects for use in different R sessions. I could imagine caching the result objects for use by a different vignette in the same build session, but this gets back to the problem of ensuring that the result object is generated by one vignette before it is needed by another vignette.

-A

On 18 Dec 2018, at 14:14, Martin Morgan <mtmorgan.bioc at gmail.com <mailto:mtmorgan.bioc at gmail.com>> wrote:

Also perhaps using BiocFileCache so that the result object is only generated once, then cached for future (different session) use.

?On 12/18/18, 8:35 AM, "Bioc-devel on behalf of Michael Lawrence" <bioc-devel-bounces at r-project.org <mailto:bioc-devel-bounces at r-project.org> on behalf of lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>> wrote:

   I would recommend against dependencies across vignettes. Ideally someone
   can pick up a vignette and execute the code independently of any other
   documentation. Perhaps you could move the code generating those shared
   resources to the package. They could behave lazily, only generating the
   resource if necessary, otherwise reusing it. That would also make it easy
   for people to write their own documents using those resources.

   Michael

   On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun <
   infinite.monkeys.with.keyboards at gmail.com <mailto:infinite.monkeys.with.keyboards at gmail.com>> wrote:

In a number of my workflow packages (e.g., simpleSingleCell), I rely on a
specific compilation order for my vignettes. This is because some vignettes
set up resources or objects that are to be used by later vignettes.

From what I understand, vignettes are compiled in alphanumeric ordering of
their file names. As such, I give my vignettes fairly structured names,
e.g., ?work-1-reads.Rmd?, ?work-2-umi.Rmd? and so on.

However, it becomes rather annoying when I want to add a new vignette in
the middle somewhere. This results in some unnatural numberings, e.g.,
?work-0?, ?3b?, which are ugly and unintuitive. This is relevant as
BiocStyle::Biocpkg() links between vignettes require you to use the
destination vignette?s file name; so difficult names complicate linking,
especially if the names continually change to reflect new orderings.

Is there an easier way to control vignette compilation order? WRE provides
no (obvious) guidance, so I would like to know what non-standard hacks are
known to work on the build machines. I can imagine something dirty whereby
one ?reference? vignette contains code to ?rmarkdown::render" all other
vignettes in the specified order? ugh.

-A

_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>

   _______________________________________________
   Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>

_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>

Michael Lawrence

Sat, Dec 22, 2018 10:25 AM #

Anything that eventually lands in inst/doc is a vignette, I think, so
there might be a hack around that.

On Fri, Dec 21, 2018 at 11:26 PM Aaron Lun

<infinite.monkeys.with.keyboards at gmail.com> wrote:

I gave it a shot:

https://github.com/LTLA/DrakeTest <https://github.com/LTLA/DrakeTest>

This uses a single ?controller? Rmd file to trigger Drake::make. Running this file will instruct Drake to compile all of the other vignettes following the desired dependency structure.

The current sticking point is that I need to move the Drake-controlled Rmd files out of ?vignettes/?, otherwise they?ll just be compiled as usual without consideration of their dependencies. This causes problems as R CMD BUILD only recognizes the controller Rmd file as the sole vignette, and doesn?t retain or index the HTML files produced from the other Rmd files as side-effects of running the controller.

Are there any better ways to subvert the vignette building procedure to get the desired effect of running drake::make() and recognition of the resulting HTMLs as vignettes?

-A

On 18 Dec 2018, at 17:41, Michael Lawrence <lawrence.michael at gene.com> wrote:

Sounds like a use case for drake...

On Tue, Dec 18, 2018 at 6:58 AM Aaron Lun <infinite.monkeys.with.keyboards at gmail.com <mailto:infinite.monkeys.with.keyboards at gmail.com>> wrote:
@Michael In this case, the resource produced by vignette X is a SingleCellExperiment object containing the results of various processing steps (normalization, clustering, etc.) described in that vignette.

I can imagine a lazy evaluation model for this, but it wouldn?t be pretty. If I had another vignette Y that depended on the SCE produced by vignette X, I would need Y to execute all of the steps in X if X hadn?t already been run before Y. This gets us into the territory of Makefile-like dependencies, which seems even more complicated than simply specifying a compilation order.

You might ask why X and Y are split into two separate vignettes. The use of different vignettes is motivated by the complexity of the workflows:

- Vignette 1 demonstrates core processing steps for one read-based single-cell RNAseq dataset.
- Vignette 2 demonstrates (slightly different) core steps for a UMI-based dataset.
- ? so on for a bunch of other core steps for different types of data.
- Vignette 6 demonstrates extra optional steps for the two SCEs produced by vignettes 1 & 3.
- ? and so on for a bunch of other optional steps.

The separation between core and optional steps into separate documents is desirable. From a pedagogical perspective, I would very much like to get the reader through all the core steps before even considering the extra steps, which would just be confusing if presented so early on. Previously, everything was in a single document, which was difficult to read (for users) and to debug (for me), especially because I had to use contrived variable names to avoid clashes between different sections of the workflow that did similar things.

@Martin I?ve been using BiocFileCache for all of the online resources that are used in the workflow. However, this is only for my (and the reader?s) convenience. I use a local cache rather than the system default, to ensure that the downloaded files are removed after package build. This is intentional as it forces the package builder to try to re-download resources when compiling the vignette, thus ensuring the validity of the URLs. For a similar reason, I would prefer not to cache the result objects for use in different R sessions. I could imagine caching the result objects for use by a different vignette in the same build session, but this gets back to the problem of ensuring that the result object is generated by one vignette before it is needed by another vignette.

-A

On 18 Dec 2018, at 14:14, Martin Morgan <mtmorgan.bioc at gmail.com <mailto:mtmorgan.bioc at gmail.com>> wrote:

Also perhaps using BiocFileCache so that the result object is only generated once, then cached for future (different session) use.

?On 12/18/18, 8:35 AM, "Bioc-devel on behalf of Michael Lawrence" <bioc-devel-bounces at r-project.org <mailto:bioc-devel-bounces at r-project.org> on behalf of lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>> wrote:

   I would recommend against dependencies across vignettes. Ideally someone
   can pick up a vignette and execute the code independently of any other
   documentation. Perhaps you could move the code generating those shared
   resources to the package. They could behave lazily, only generating the
   resource if necessary, otherwise reusing it. That would also make it easy
   for people to write their own documents using those resources.

   Michael

   On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun <
   infinite.monkeys.with.keyboards at gmail.com <mailto:infinite.monkeys.with.keyboards at gmail.com>> wrote:

In a number of my workflow packages (e.g., simpleSingleCell), I rely on a
specific compilation order for my vignettes. This is because some vignettes
set up resources or objects that are to be used by later vignettes.

From what I understand, vignettes are compiled in alphanumeric ordering of
their file names. As such, I give my vignettes fairly structured names,
e.g., ?work-1-reads.Rmd?, ?work-2-umi.Rmd? and so on.

However, it becomes rather annoying when I want to add a new vignette in
the middle somewhere. This results in some unnatural numberings, e.g.,
?work-0?, ?3b?, which are ugly and unintuitive. This is relevant as
BiocStyle::Biocpkg() links between vignettes require you to use the
destination vignette?s file name; so difficult names complicate linking,
especially if the names continually change to reflect new orderings.

Is there an easier way to control vignette compilation order? WRE provides
no (obvious) guidance, so I would like to know what non-standard hacks are
known to work on the build machines. I can imagine something dirty whereby
one ?reference? vignette contains code to ?rmarkdown::render" all other
vignettes in the specified order? ugh.

-A

_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>

   _______________________________________________
   Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>

_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Martin Morgan

Sat, Dec 22, 2018 11:00 AM #

...but in the end isn't it just simpler to name your vignettes in collation order? Who other than you will be able to parse what you've done?

Martin

?On 12/22/18, 1:56 PM, "Bioc-devel on behalf of Michael Lawrence" <bioc-devel-bounces at r-project.org on behalf of lawrence.michael at gene.com> wrote:

    Anything that eventually lands in inst/doc is a vignette, I think, so
    there might be a hack around that.
    
    On Fri, Dec 21, 2018 at 11:26 PM Aaron Lun

<infinite.monkeys.with.keyboards at gmail.com> wrote:

>
    > I gave it a shot:
    >
    > https://github.com/LTLA/DrakeTest <https://github.com/LTLA/DrakeTest>
    >
    > This uses a single ?controller? Rmd file to trigger Drake::make. Running this file will instruct Drake to compile all of the other vignettes following the desired dependency structure.
    >
    > The current sticking point is that I need to move the Drake-controlled Rmd files out of ?vignettes/?, otherwise they?ll just be compiled as usual without consideration of their dependencies. This causes problems as R CMD BUILD only recognizes the controller Rmd file as the sole vignette, and doesn?t retain or index the HTML files produced from the other Rmd files as side-effects of running the controller.
    >
    > Are there any better ways to subvert the vignette building procedure to get the desired effect of running drake::make() and recognition of the resulting HTMLs as vignettes?
    >
    > -A
    >

> > On 18 Dec 2018, at 17:41, Michael Lawrence <lawrence.michael at gene.com> wrote:

> >
    > > Sounds like a use case for drake...
    > >
    > > On Tue, Dec 18, 2018 at 6:58 AM Aaron Lun <infinite.monkeys.with.keyboards at gmail.com <mailto:infinite.monkeys.with.keyboards at gmail.com>> wrote:
    > > @Michael In this case, the resource produced by vignette X is a SingleCellExperiment object containing the results of various processing steps (normalization, clustering, etc.) described in that vignette.
    > >
    > > I can imagine a lazy evaluation model for this, but it wouldn?t be pretty. If I had another vignette Y that depended on the SCE produced by vignette X, I would need Y to execute all of the steps in X if X hadn?t already been run before Y. This gets us into the territory of Makefile-like dependencies, which seems even more complicated than simply specifying a compilation order.
    > >
    > > You might ask why X and Y are split into two separate vignettes. The use of different vignettes is motivated by the complexity of the workflows:
    > >
    > > - Vignette 1 demonstrates core processing steps for one read-based single-cell RNAseq dataset.
    > > - Vignette 2 demonstrates (slightly different) core steps for a UMI-based dataset.
    > > - ? so on for a bunch of other core steps for different types of data.
    > > - Vignette 6 demonstrates extra optional steps for the two SCEs produced by vignettes 1 & 3.
    > > - ? and so on for a bunch of other optional steps.
    > >
    > > The separation between core and optional steps into separate documents is desirable. From a pedagogical perspective, I would very much like to get the reader through all the core steps before even considering the extra steps, which would just be confusing if presented so early on. Previously, everything was in a single document, which was difficult to read (for users) and to debug (for me), especially because I had to use contrived variable names to avoid clashes between different sections of the workflow that did similar things.
    > >
    > > @Martin I?ve been using BiocFileCache for all of the online resources that are used in the workflow. However, this is only for my (and the reader?s) convenience. I use a local cache rather than the system default, to ensure that the downloaded files are removed after package build. This is intentional as it forces the package builder to try to re-download resources when compiling the vignette, thus ensuring the validity of the URLs. For a similar reason, I would prefer not to cache the result objects for use in different R sessions. I could imagine caching the result objects for use by a different vignette in the same build session, but this gets back to the problem of ensuring that the result object is generated by one vignette before it is needed by another vignette.
    > >
    > > -A
    > >

> > > On 18 Dec 2018, at 14:14, Martin Morgan <mtmorgan.bioc at gmail.com <mailto:mtmorgan.bioc at gmail.com>> wrote:

> > >
    > > > Also perhaps using BiocFileCache so that the result object is only generated once, then cached for future (different session) use.
    > > >
    > > > ?On 12/18/18, 8:35 AM, "Bioc-devel on behalf of Michael Lawrence" <bioc-devel-bounces at r-project.org <mailto:bioc-devel-bounces at r-project.org> on behalf of lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>> wrote:
    > > >
    > > >    I would recommend against dependencies across vignettes. Ideally someone
    > > >    can pick up a vignette and execute the code independently of any other
    > > >    documentation. Perhaps you could move the code generating those shared
    > > >    resources to the package. They could behave lazily, only generating the
    > > >    resource if necessary, otherwise reusing it. That would also make it easy
    > > >    for people to write their own documents using those resources.
    > > >
    > > >    Michael
    > > >
    > > >    On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun <

> > > infinite.monkeys.with.keyboards at gmail.com <mailto:infinite.monkeys.with.keyboards at gmail.com>> wrote:

> > >
    > > >> In a number of my workflow packages (e.g., simpleSingleCell), I rely on a
    > > >> specific compilation order for my vignettes. This is because some vignettes
    > > >> set up resources or objects that are to be used by later vignettes.
    > > >>
    > > >> From what I understand, vignettes are compiled in alphanumeric ordering of
    > > >> their file names. As such, I give my vignettes fairly structured names,
    > > >> e.g., ?work-1-reads.Rmd?, ?work-2-umi.Rmd? and so on.
    > > >>
    > > >> However, it becomes rather annoying when I want to add a new vignette in
    > > >> the middle somewhere. This results in some unnatural numberings, e.g.,
    > > >> ?work-0?, ?3b?, which are ugly and unintuitive. This is relevant as
    > > >> BiocStyle::Biocpkg() links between vignettes require you to use the
    > > >> destination vignette?s file name; so difficult names complicate linking,
    > > >> especially if the names continually change to reflect new orderings.
    > > >>
    > > >> Is there an easier way to control vignette compilation order? WRE provides
    > > >> no (obvious) guidance, so I would like to know what non-standard hacks are
    > > >> known to work on the build machines. I can imagine something dirty whereby
    > > >> one ?reference? vignette contains code to ?rmarkdown::render" all other
    > > >> vignettes in the specified order? ugh.
    > > >>
    > > >> -A
    > > >>
    > > >> _______________________________________________
    > > >> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
    > > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
    > > >>
    > > >>
    > > >
    > > >       [[alternative HTML version deleted]]
    > > >
    > > >    _______________________________________________
    > > >    Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
    > > >    https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
    > > >
    > >
    > > _______________________________________________
    > > Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
    > > https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
    >
    >
    >         [[alternative HTML version deleted]]
    >
    > _______________________________________________
    > Bioc-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/bioc-devel
    >
    
    _______________________________________________
    Bioc-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel

Henrik Bengtsson

Sat, Dec 22, 2018 11:22 AM #

On Sat, Dec 22, 2018 at 10:56 AM Michael Lawrence

<lawrence.michael at gene.com> wrote:

Just so this is not misread - it's *not* possible to just hack your
vignette "product" files (PDF or HTML) into inst/doc and thinking
you're good.  R keeps track of package vignettes in a "vignette
index", e.g.

File              Title        PDF        R Depends Keywords
1 Sweave.Rnw Sweave User Manual Sweave.pdf Sweave.R   tools

which is created during 'R CMD build' by parsing and compiling the
vignettes (https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/tools/R/build.R#L283-L393).
This vignette index is used to find package vignettes (e.g.
utils::vignette()) and build the HTML vignette index.

Also, one vignette source (e.g. Rnw, Rmd, ...) can only produce one
vignette product (PDF or HTML) in the vignette index.  You can output
other files (e.g. image files) in a relative folder that the vignette
references, which is why for instance non-self-contained HTML files
work.  Thus, one ad-hoc, not-so-nice hack that OP could do is to have
a single main vignette that produces and links to all child vignettes.
However, personally, I'd aim for using memoization/caching (to file)
such that each vignette can be compiled independently of the others
(and in any order), while still reusing intermediate
results/calculations produced by earlier vignettes.

/Henrik

On Fri, Dec 21, 2018 at 11:26 PM Aaron Lun
<infinite.monkeys.with.keyboards at gmail.com> wrote:

I gave it a shot:

https://github.com/LTLA/DrakeTest <https://github.com/LTLA/DrakeTest>

This uses a single ?controller? Rmd file to trigger Drake::make. Running this file will instruct Drake to compile all of the other vignettes following the desired dependency structure.

The current sticking point is that I need to move the Drake-controlled Rmd files out of ?vignettes/?, otherwise they?ll just be compiled as usual without consideration of their dependencies. This causes problems as R CMD BUILD only recognizes the controller Rmd file as the sole vignette, and doesn?t retain or index the HTML files produced from the other Rmd files as side-effects of running the controller.

Are there any better ways to subvert the vignette building procedure to get the desired effect of running drake::make() and recognition of the resulting HTMLs as vignettes?

-A

On 18 Dec 2018, at 17:41, Michael Lawrence <lawrence.michael at gene.com> wrote:

Sounds like a use case for drake...

On Tue, Dec 18, 2018 at 6:58 AM Aaron Lun <infinite.monkeys.with.keyboards at gmail.com <mailto:infinite.monkeys.with.keyboards at gmail.com>> wrote:
@Michael In this case, the resource produced by vignette X is a SingleCellExperiment object containing the results of various processing steps (normalization, clustering, etc.) described in that vignette.

I can imagine a lazy evaluation model for this, but it wouldn?t be pretty. If I had another vignette Y that depended on the SCE produced by vignette X, I would need Y to execute all of the steps in X if X hadn?t already been run before Y. This gets us into the territory of Makefile-like dependencies, which seems even more complicated than simply specifying a compilation order.

You might ask why X and Y are split into two separate vignettes. The use of different vignettes is motivated by the complexity of the workflows:

- Vignette 1 demonstrates core processing steps for one read-based single-cell RNAseq dataset.
- Vignette 2 demonstrates (slightly different) core steps for a UMI-based dataset.
- ? so on for a bunch of other core steps for different types of data.
- Vignette 6 demonstrates extra optional steps for the two SCEs produced by vignettes 1 & 3.
- ? and so on for a bunch of other optional steps.

The separation between core and optional steps into separate documents is desirable. From a pedagogical perspective, I would very much like to get the reader through all the core steps before even considering the extra steps, which would just be confusing if presented so early on. Previously, everything was in a single document, which was difficult to read (for users) and to debug (for me), especially because I had to use contrived variable names to avoid clashes between different sections of the workflow that did similar things.

@Martin I?ve been using BiocFileCache for all of the online resources that are used in the workflow. However, this is only for my (and the reader?s) convenience. I use a local cache rather than the system default, to ensure that the downloaded files are removed after package build. This is intentional as it forces the package builder to try to re-download resources when compiling the vignette, thus ensuring the validity of the URLs. For a similar reason, I would prefer not to cache the result objects for use in different R sessions. I could imagine caching the result objects for use by a different vignette in the same build session, but this gets back to the problem of ensuring that the result object is generated by one vignette before it is needed by another vignette.

-A

On 18 Dec 2018, at 14:14, Martin Morgan <mtmorgan.bioc at gmail.com <mailto:mtmorgan.bioc at gmail.com>> wrote:

Also perhaps using BiocFileCache so that the result object is only generated once, then cached for future (different session) use.

?On 12/18/18, 8:35 AM, "Bioc-devel on behalf of Michael Lawrence" <bioc-devel-bounces at r-project.org <mailto:bioc-devel-bounces at r-project.org> on behalf of lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>> wrote:

   I would recommend against dependencies across vignettes. Ideally someone
   can pick up a vignette and execute the code independently of any other
   documentation. Perhaps you could move the code generating those shared
   resources to the package. They could behave lazily, only generating the
   resource if necessary, otherwise reusing it. That would also make it easy
   for people to write their own documents using those resources.

   Michael

   On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun <
   infinite.monkeys.with.keyboards at gmail.com <mailto:infinite.monkeys.with.keyboards at gmail.com>> wrote:

In a number of my workflow packages (e.g., simpleSingleCell), I rely on a
specific compilation order for my vignettes. This is because some vignettes
set up resources or objects that are to be used by later vignettes.

From what I understand, vignettes are compiled in alphanumeric ordering of
their file names. As such, I give my vignettes fairly structured names,
e.g., ?work-1-reads.Rmd?, ?work-2-umi.Rmd? and so on.

However, it becomes rather annoying when I want to add a new vignette in
the middle somewhere. This results in some unnatural numberings, e.g.,
?work-0?, ?3b?, which are ugly and unintuitive. This is relevant as
BiocStyle::Biocpkg() links between vignettes require you to use the
destination vignette?s file name; so difficult names complicate linking,
especially if the names continually change to reflect new orderings.

Is there an easier way to control vignette compilation order? WRE provides
no (obvious) guidance, so I would like to know what non-standard hacks are
known to work on the build machines. I can imagine something dirty whereby
one ?reference? vignette contains code to ?rmarkdown::render" all other
vignettes in the specified order? ugh.

-A

_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>

   _______________________________________________
   Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>

_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Aaron Lun

Sat, Dec 22, 2018 4:20 PM #

Yes, that is the simplest solution, and it?s what I?m doing now.

It?s not overly confusing for a reader, but it?s awkward to add new vignettes in the middle of the compilation order, as I then have to rename the others (or give the new vignette a weird name, e.g., ?xtra-3b-de.Rmd? to get it to fall behind ?xtra-3-var.Rmd?).

Aaron Lun

Sat, Dec 22, 2018 5:24 PM #

Yes, I had noticed the vignettes.rds as well, and I figured that would be a problem.

I just tried setting set cache=TRUE in my vignettes, implemented such that BUILDing each downstream vignette will also run all upstream vignettes on which it depends (that haven?t already been compiled). If an upstream vignette is run in this manner, it caches the results of each code chunk to avoid repeated work when it gets compiled ?for real? by R CMD BUILD.

This seems to work on initial inspection (the caches are produced for the upstream vignettes upon running one downstream vignette). I?ll have to check whether this plays nice with R CMD BUILD. I will probably have to write a function to isolate the scope of the execution of each upstream vignette, to avoid polluting the namespace and cache of each downstream vignette.

-A

On 22 Dec 2018, at 19:22, Henrik Bengtsson <henrik.bengtsson at gmail.com> wrote:

On Sat, Dec 22, 2018 at 10:56 AM Michael Lawrence
<lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>> wrote:

Anything that eventually lands in inst/doc is a vignette, I think, so
there might be a hack around that.

Just so this is not misread - it's *not* possible to just hack your
vignette "product" files (PDF or HTML) into inst/doc and thinking
you're good.  R keeps track of package vignettes in a "vignette
index", e.g.

readRDS(system.file(package = "utils", "Meta", "vignette.rds"))

       File              Title        PDF        R Depends Keywords
1 Sweave.Rnw Sweave User Manual Sweave.pdf Sweave.R   tools

which is created during 'R CMD build' by parsing and compiling the
vignettes (https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/tools/R/build.R#L283-L393 <https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/tools/R/build.R#L283-L393>).
This vignette index is used to find package vignettes (e.g.
utils::vignette()) and build the HTML vignette index.

Also, one vignette source (e.g. Rnw, Rmd, ...) can only produce one
vignette product (PDF or HTML) in the vignette index.  You can output
other files (e.g. image files) in a relative folder that the vignette
references, which is why for instance non-self-contained HTML files
work.  Thus, one ad-hoc, not-so-nice hack that OP could do is to have
a single main vignette that produces and links to all child vignettes.
However, personally, I'd aim for using memoization/caching (to file)
such that each vignette can be compiled independently of the others
(and in any order), while still reusing intermediate
results/calculations produced by earlier vignettes.

/Henrik

On Fri, Dec 21, 2018 at 11:26 PM Aaron Lun
<infinite.monkeys.with.keyboards at gmail.com> wrote:

I gave it a shot:

https://github.com/LTLA/DrakeTest <https://github.com/LTLA/DrakeTest>

This uses a single ?controller? Rmd file to trigger Drake::make. Running this file will instruct Drake to compile all of the other vignettes following the desired dependency structure.

The current sticking point is that I need to move the Drake-controlled Rmd files out of ?vignettes/?, otherwise they?ll just be compiled as usual without consideration of their dependencies. This causes problems as R CMD BUILD only recognizes the controller Rmd file as the sole vignette, and doesn?t retain or index the HTML files produced from the other Rmd files as side-effects of running the controller.

Are there any better ways to subvert the vignette building procedure to get the desired effect of running drake::make() and recognition of the resulting HTMLs as vignettes?

-A

On 18 Dec 2018, at 17:41, Michael Lawrence <lawrence.michael at gene.com> wrote:

Sounds like a use case for drake...

On Tue, Dec 18, 2018 at 6:58 AM Aaron Lun <infinite.monkeys.with.keyboards at gmail.com <mailto:infinite.monkeys.with.keyboards at gmail.com>> wrote:
@Michael In this case, the resource produced by vignette X is a SingleCellExperiment object containing the results of various processing steps (normalization, clustering, etc.) described in that vignette.

I can imagine a lazy evaluation model for this, but it wouldn?t be pretty. If I had another vignette Y that depended on the SCE produced by vignette X, I would need Y to execute all of the steps in X if X hadn?t already been run before Y. This gets us into the territory of Makefile-like dependencies, which seems even more complicated than simply specifying a compilation order.

You might ask why X and Y are split into two separate vignettes. The use of different vignettes is motivated by the complexity of the workflows:

- Vignette 1 demonstrates core processing steps for one read-based single-cell RNAseq dataset.
- Vignette 2 demonstrates (slightly different) core steps for a UMI-based dataset.
- ? so on for a bunch of other core steps for different types of data.
- Vignette 6 demonstrates extra optional steps for the two SCEs produced by vignettes 1 & 3.
- ? and so on for a bunch of other optional steps.

The separation between core and optional steps into separate documents is desirable. From a pedagogical perspective, I would very much like to get the reader through all the core steps before even considering the extra steps, which would just be confusing if presented so early on. Previously, everything was in a single document, which was difficult to read (for users) and to debug (for me), especially because I had to use contrived variable names to avoid clashes between different sections of the workflow that did similar things.

@Martin I?ve been using BiocFileCache for all of the online resources that are used in the workflow. However, this is only for my (and the reader?s) convenience. I use a local cache rather than the system default, to ensure that the downloaded files are removed after package build. This is intentional as it forces the package builder to try to re-download resources when compiling the vignette, thus ensuring the validity of the URLs. For a similar reason, I would prefer not to cache the result objects for use in different R sessions. I could imagine caching the result objects for use by a different vignette in the same build session, but this gets back to the problem of ensuring that the result object is generated by one vignette before it is needed by another vignette.

-A

On 18 Dec 2018, at 14:14, Martin Morgan <mtmorgan.bioc at gmail.com <mailto:mtmorgan.bioc at gmail.com>> wrote:

Also perhaps using BiocFileCache so that the result object is only generated once, then cached for future (different session) use.

?On 12/18/18, 8:35 AM, "Bioc-devel on behalf of Michael Lawrence" <bioc-devel-bounces at r-project.org <mailto:bioc-devel-bounces at r-project.org> on behalf of lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>> wrote:

  I would recommend against dependencies across vignettes. Ideally someone
  can pick up a vignette and execute the code independently of any other
  documentation. Perhaps you could move the code generating those shared
  resources to the package. They could behave lazily, only generating the
  resource if necessary, otherwise reusing it. That would also make it easy
  for people to write their own documents using those resources.

  Michael

  On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun <
  infinite.monkeys.with.keyboards at gmail.com <mailto:infinite.monkeys.with.keyboards at gmail.com>> wrote:

In a number of my workflow packages (e.g., simpleSingleCell), I rely on a
specific compilation order for my vignettes. This is because some vignettes
set up resources or objects that are to be used by later vignettes.

From what I understand, vignettes are compiled in alphanumeric ordering of
their file names. As such, I give my vignettes fairly structured names,
e.g., ?work-1-reads.Rmd?, ?work-2-umi.Rmd? and so on.

However, it becomes rather annoying when I want to add a new vignette in
the middle somewhere. This results in some unnatural numberings, e.g.,
?work-0?, ?3b?, which are ugly and unintuitive. This is relevant as
BiocStyle::Biocpkg() links between vignettes require you to use the
destination vignette?s file name; so difficult names complicate linking,
especially if the names continually change to reflect new orderings.

Is there an easier way to control vignette compilation order? WRE provides
no (obvious) guidance, so I would like to know what non-standard hacks are
known to work on the build machines. I can imagine something dirty whereby
one ?reference? vignette contains code to ?rmarkdown::render" all other
vignettes in the specified order? ugh.

-A

_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>

  _______________________________________________
  Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>

_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>

Aaron Lun

Mon, Dec 24, 2018 12:02 AM #

A working example of knitr caching across workflows is now available at https://github.com/LTLA/BiocWorkCache <https://github.com/LTLA/BiocWorkCache>. 

It uses ?~/chipseq.log? as a log to demonstrate that the code in the most-upstream workflow (?test1.Rmd?) is indeed only executed once during the BUILD.

Note that the compilation of upstream vignettes involves a system call out to a separate R session. This avoids some difficult issues with caching when a Rmd file is compiled from within another Rmd file - trying to use rmarkdown::render() on the upstream vignette within a downstream vignette does not generate a cache that is recognized when BUILD goes onto compile the upstream vignette.

-A

On 23 Dec 2018, at 01:24, Aaron Lun <infinite.monkeys.with.keyboards at gmail.com> wrote:

Yes, I had noticed the vignettes.rds as well, and I figured that would be a problem.

I just tried setting set cache=TRUE in my vignettes, implemented such that BUILDing each downstream vignette will also run all upstream vignettes on which it depends (that haven?t already been compiled). If an upstream vignette is run in this manner, it caches the results of each code chunk to avoid repeated work when it gets compiled ?for real? by R CMD BUILD.

This seems to work on initial inspection (the caches are produced for the upstream vignettes upon running one downstream vignette). I?ll have to check whether this plays nice with R CMD BUILD. I will probably have to write a function to isolate the scope of the execution of each upstream vignette, to avoid polluting the namespace and cache of each downstream vignette.

-A

On 22 Dec 2018, at 19:22, Henrik Bengtsson <henrik.bengtsson at gmail.com <mailto:henrik.bengtsson at gmail.com>> wrote:

On Sat, Dec 22, 2018 at 10:56 AM Michael Lawrence
<lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>> wrote:

Anything that eventually lands in inst/doc is a vignette, I think, so
there might be a hack around that.

Just so this is not misread - it's *not* possible to just hack your
vignette "product" files (PDF or HTML) into inst/doc and thinking
you're good.  R keeps track of package vignettes in a "vignette
index", e.g.

readRDS(system.file(package = "utils", "Meta", "vignette.rds"))

       File              Title        PDF        R Depends Keywords
1 Sweave.Rnw Sweave User Manual Sweave.pdf Sweave.R   tools

which is created during 'R CMD build' by parsing and compiling the
vignettes (https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/tools/R/build.R#L283-L393 <https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/tools/R/build.R#L283-L393>).
This vignette index is used to find package vignettes (e.g.
utils::vignette()) and build the HTML vignette index.

Also, one vignette source (e.g. Rnw, Rmd, ...) can only produce one
vignette product (PDF or HTML) in the vignette index.  You can output
other files (e.g. image files) in a relative folder that the vignette
references, which is why for instance non-self-contained HTML files
work.  Thus, one ad-hoc, not-so-nice hack that OP could do is to have
a single main vignette that produces and links to all child vignettes.
However, personally, I'd aim for using memoization/caching (to file)
such that each vignette can be compiled independently of the others
(and in any order), while still reusing intermediate
results/calculations produced by earlier vignettes.

/Henrik

On Fri, Dec 21, 2018 at 11:26 PM Aaron Lun
<infinite.monkeys.with.keyboards at gmail.com <mailto:infinite.monkeys.with.keyboards at gmail.com>> wrote:

I gave it a shot:

https://github.com/LTLA/DrakeTest <https://github.com/LTLA/DrakeTest> <https://github.com/LTLA/DrakeTest <https://github.com/LTLA/DrakeTest>>

This uses a single ?controller? Rmd file to trigger Drake::make. Running this file will instruct Drake to compile all of the other vignettes following the desired dependency structure.

The current sticking point is that I need to move the Drake-controlled Rmd files out of ?vignettes/?, otherwise they?ll just be compiled as usual without consideration of their dependencies. This causes problems as R CMD BUILD only recognizes the controller Rmd file as the sole vignette, and doesn?t retain or index the HTML files produced from the other Rmd files as side-effects of running the controller.

Are there any better ways to subvert the vignette building procedure to get the desired effect of running drake::make() and recognition of the resulting HTMLs as vignettes?

-A

On 18 Dec 2018, at 17:41, Michael Lawrence <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>> wrote:

Sounds like a use case for drake...

On Tue, Dec 18, 2018 at 6:58 AM Aaron Lun <infinite.monkeys.with.keyboards at gmail.com <mailto:infinite.monkeys.with.keyboards at gmail.com> <mailto:infinite.monkeys.with.keyboards at gmail.com <mailto:infinite.monkeys.with.keyboards at gmail.com>>> wrote:
@Michael In this case, the resource produced by vignette X is a SingleCellExperiment object containing the results of various processing steps (normalization, clustering, etc.) described in that vignette.

I can imagine a lazy evaluation model for this, but it wouldn?t be pretty. If I had another vignette Y that depended on the SCE produced by vignette X, I would need Y to execute all of the steps in X if X hadn?t already been run before Y. This gets us into the territory of Makefile-like dependencies, which seems even more complicated than simply specifying a compilation order.

You might ask why X and Y are split into two separate vignettes. The use of different vignettes is motivated by the complexity of the workflows:

- Vignette 1 demonstrates core processing steps for one read-based single-cell RNAseq dataset.
- Vignette 2 demonstrates (slightly different) core steps for a UMI-based dataset.
- ? so on for a bunch of other core steps for different types of data.
- Vignette 6 demonstrates extra optional steps for the two SCEs produced by vignettes 1 & 3.
- ? and so on for a bunch of other optional steps.

The separation between core and optional steps into separate documents is desirable. From a pedagogical perspective, I would very much like to get the reader through all the core steps before even considering the extra steps, which would just be confusing if presented so early on. Previously, everything was in a single document, which was difficult to read (for users) and to debug (for me), especially because I had to use contrived variable names to avoid clashes between different sections of the workflow that did similar things.

@Martin I?ve been using BiocFileCache for all of the online resources that are used in the workflow. However, this is only for my (and the reader?s) convenience. I use a local cache rather than the system default, to ensure that the downloaded files are removed after package build. This is intentional as it forces the package builder to try to re-download resources when compiling the vignette, thus ensuring the validity of the URLs. For a similar reason, I would prefer not to cache the result objects for use in different R sessions. I could imagine caching the result objects for use by a different vignette in the same build session, but this gets back to the problem of ensuring that the result object is generated by one vignette before it is needed by another vignette.

-A

On 18 Dec 2018, at 14:14, Martin Morgan <mtmorgan.bioc at gmail.com <mailto:mtmorgan.bioc at gmail.com> <mailto:mtmorgan.bioc at gmail.com <mailto:mtmorgan.bioc at gmail.com>>> wrote:

Also perhaps using BiocFileCache so that the result object is only generated once, then cached for future (different session) use.

?On 12/18/18, 8:35 AM, "Bioc-devel on behalf of Michael Lawrence" <bioc-devel-bounces at r-project.org <mailto:bioc-devel-bounces at r-project.org> <mailto:bioc-devel-bounces at r-project.org <mailto:bioc-devel-bounces at r-project.org>> on behalf of lawrence.michael at gene.com <mailto:lawrence.michael at gene.com> <mailto:lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>> wrote:

  I would recommend against dependencies across vignettes. Ideally someone
  can pick up a vignette and execute the code independently of any other
  documentation. Perhaps you could move the code generating those shared
  resources to the package. They could behave lazily, only generating the
  resource if necessary, otherwise reusing it. That would also make it easy
  for people to write their own documents using those resources.

  Michael

  On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun <
  infinite.monkeys.with.keyboards at gmail.com <mailto:infinite.monkeys.with.keyboards at gmail.com> <mailto:infinite.monkeys.with.keyboards at gmail.com <mailto:infinite.monkeys.with.keyboards at gmail.com>>> wrote:

In a number of my workflow packages (e.g., simpleSingleCell), I rely on a
specific compilation order for my vignettes. This is because some vignettes
set up resources or objects that are to be used by later vignettes.

From what I understand, vignettes are compiled in alphanumeric ordering of
their file names. As such, I give my vignettes fairly structured names,
e.g., ?work-1-reads.Rmd?, ?work-2-umi.Rmd? and so on.

However, it becomes rather annoying when I want to add a new vignette in
the middle somewhere. This results in some unnatural numberings, e.g.,
?work-0?, ?3b?, which are ugly and unintuitive. This is relevant as
BiocStyle::Biocpkg() links between vignettes require you to use the
destination vignette?s file name; so difficult names complicate linking,
especially if the names continually change to reflect new orderings.

Is there an easier way to control vignette compilation order? WRE provides
no (obvious) guidance, so I would like to know what non-standard hacks are
known to work on the build machines. I can imagine something dirty whereby
one ?reference? vignette contains code to ?rmarkdown::render" all other
vignettes in the specified order? ugh.

-A

_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> <mailto:Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel> <https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>

  _______________________________________________
  Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> <mailto:Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>> mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel> <https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>

_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> <mailto:Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel> <https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>

_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>

_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>