Skip to content

[Bioc-devel] splitting simpleSingleCell into self-contained vignettes

9 messages · Obenchain, Valerie, Aaron Lun, Andrzej Oleś

#
Following up on our earlier discussion:

https://stat.ethz.ch/pipermail/bioc-devel/2017-October/011949.html

I have split the simpleSingleCell workflow into three (four, if you 
include the introductory overview) self-contained Rmarkdown files. I am 
preparing them for submission to BioC's workflow builder, and I would 
like to check what is the best way to do this:

i) Each workflow file goes into its own package.

ii) All workflow files go into a single package.

Option (i) is logistically easier but probably a bit odd conceptually, 
especially if users need to download "simpleSingleCell1", 
"simpleSingleCell2", "simpleSingleCell3", etc.

Option (ii) is nicer but requires more coordination, as the BioC webpage 
builder needs to know that that multiple HTMLs have been generated. It's 
also unclear to me whether this will run into problems with the DLL 
limit - does R restart when compiling each vignette?

Any thoughts would be appreciated. I'm also happy to be a guinea pig for 
any SVN->Git transition for the workflow packages, if that's on the radar.

Cheers,

Aaron
#
Hi,
On 12/11/2017 08:49 AM, Aaron Lun wrote:
Following up on our earlier discussion:

https://stat.ethz.ch/pipermail/bioc-devel/2017-October/011949.html

I have split the simpleSingleCell workflow into three (four, if you
include the introductory overview) self-contained Rmarkdown files. I am
preparing them for submission to BioC's workflow builder, and I would
like to check what is the best way to do this:

i) Each workflow file goes into its own package.

ii) All workflow files go into a single package.

Option (i) is logistically easier but probably a bit odd conceptually,
especially if users need to download "simpleSingleCell1",
"simpleSingleCell2", "simpleSingleCell3", etc.



Option (ii) is nicer but requires more coordination, as the BioC webpage
builder needs to know that that multiple HTMLs have been generated. It's
also unclear to me whether this will run into problems with the DLL
limit - does R restart when compiling each vignette?

You could do either but I'd say option 2 is easier from a maintenance standpoint and probably for the user. Maybe you've seen this but an example is the annotation workflow package which houses 2 workflows:

~/repos/svn/workflows >ls annotation/vignettes/
Annotating_Genomic_Ranges.Rmd  Annotation_Resources.Rmd  databaseTypes.png  display.png

Each has an informative name and is presented on the website as an individual workflow:

https://bioconductor.org/help/workflows/

I don't think more coordination is involved - you just have multiple files in vignettes/. And, as you mentioned, it's a bonus that when a user downloads the annotation package they get all related workflows.

A fresh R session is started for each package but not for each vignette in the package.




Any thoughts would be appreciated. I'm also happy to be a guinea pig for
any SVN->Git transition for the workflow packages, if that's on the radar.

Nitesh has created git repos for the workflow packages and Andrzej is adapting the BBS code to incorporate them into the builds. We guesstimate this will be done by the end of the year. You shouldn't have to do anything on your end - once we're ready to switch over we'll let you know and send the new location of the workflow in git.


Val



Cheers,

Aaron
_______________________________________________
Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
#
Thanks Val:
Obenchain, Valerie wrote:
I didn't know that, thanks.
Ah. That's a shame, I was hoping to reduce the sensitivity to the DLL limit.

But now that I think about it: maybe that's not actually a problem,
provided the BioC workflow builders have a high DLL limit. The main
issue was that *users* were running into the DLL limit; by splitting the
workflow up, users should no be tempted to run everything at once, thus
avoiding the limit on their machines. Of course, Bioconductor can
control its own build machines, so as long as they set the MAX_DLLs
high, it should still build and show up on the website.
Cool, looking forward to it.

-A
#
The split-up workflows seem to have built successfully:

http://docbuilder.bioconductor.org:8080/job/simpleSingleCell/

Is there something I have to do to get a blurb specific to each 
vignette, as observed for "Annotation_Resources" vs 
"Annotating_Genomic_Ranges"?

The various vignettes are ordered pedagogically, so the order in which 
they are presented in the workflow page might require some manual 
specification. It would also be nice if the multiple simpleSingleCell 
workflows are grouped together, to avoid being intermingled with other 
workflows on the page.

Finally, could we get a separate "single-cell workflows" section? The 
current "Basic/Advanced" partition is pretty crude, and I can see 
opportunities for more detailed stratification, e.g., by ChIP-seq, 
RNA-seq, single-cell RNA-seq, proteomics (including mass cytometry).

Cheers,

Aaron
On 11/12/17 20:24, Aaron Lun wrote:

  
    
#
The split-up workflows seem to have built successfully:

http://docbuilder.bioconductor.org:8080/job/simpleSingleCell/

The various vignettes are ordered pedagogically, so the order in which 
they are presented in the workflow page might require some manual 
specification. It would also be nice if the multiple simpleSingleCell 
workflows are grouped together, to avoid being intermingled with other 
workflows on the page.

Is there something I have to do to get a blurb specific to each 
vignette, as observed for "Annotation_Resources" vs 
"Annotating_Genomic_Ranges"? I'm happy to only have a blurb for the 
first workflow, given that I'd be just repeating myself for the others; 
but this depends on how it's organized on the webpage.

Finally, could we get a separate "single-cell workflows" section? The 
current "Basic/Advanced" partition is pretty crude, and I can see 
opportunities for more detailed stratification, e.g., by ChIP-seq, 
RNA-seq, single-cell RNA-seq, proteomics (including mass cytometry).

Cheers,

Aaron
On 11/12/17 20:24, Aaron Lun wrote:
#
Hi Aaron,

Thank you. I've edited the workflow index page by introducing a separate
"Single-cell Workflows" section, and by substituting the previous link to
your workflow by links to the individual parts.

As discussed during EuroBioc, I'm happy to restructure the index page by
grouping workflows by topic. It would be really helpful if authors would
chime in to suggest the most relevant sections for their workflows.

Cheers,
Andrzej
On Tue, Dec 12, 2017 at 7:19 PM, Aaron Lun <Aaron.Lun at cruk.cam.ac.uk> wrote:

            

  
  
#
Thanks Andrzej.
Great, I'm looking forward to seeing it. Do you know how frequently the
index page (I assume we're talking about
https://bioconductor.org/help/workflows/) updates? I assume your edits
haven't propagated through the system yet.
I can chip in with two that I'm involved in:

"Differential Binding from ChIP-seq data
<https://bioconductor.org/help/workflows/chipseqDB/>" => ChIP-seq workflows
"Gene-level RNA-seq differential expression and pathway analysis
<https://bioconductor.org/help/workflows/RnaSeqGeneEdgeRQL/>" => RNA-seq
workflows

Of course, it depends on how granular you want the topics to be. For
example, I only see one ChIP-seq workflow, so that particular section
might be a bit lonely for a while (I am planning to split that into two
workflows later).

Cheers,

Aaron
#
Thanks for you feedback Aaron!
On Tue, Dec 12, 2017 at 9:49 PM, Aaron Lun <alun at wehi.edu.au> wrote:

            
Not sure, should be online by now
https://github.com/Bioconductor/bioconductor.org/commit/a60c46f0942d9825f9a643321890ba5987de109b
Right, we should probably avoid hair-splitting. We can start with a few,
say 6, and split up further according to demand as new ones are introduced.

Best,
Andrzej
#
While we wait for the changes to come online: would you be open to PRs 
to workflow.md? I was thinking of making a nested list for the 
Introduction/part 1/part 2/part 3, which is a bit nicer to read.

-A
On 12/12/17 22:11, Andrzej Ole? wrote: