[Bioc-devel] Methods to speed up R CMD Check

...
Finally I'll point out there's a testthat::skip_on_bioc() function that
will allow you to skip a test on the Bioc builder, but still run that test
locally/on GitHub etc.
What?!

 > testthat::skip_on_bioc
function ()
{
     if (identical(Sys.getenv("BBS_HOME"), "")) {
         return(invisible(TRUE))
     }
     skip("On Bioconductor")
}
<bytecode: 0x564820802278>
<environment: namespace:testthat>

No way! I need to rename that BBS_HOME env variable ;-)
However, I think we'd all agree it'd be better to
get all the tests running universally, rather than take that route.
You bet.

Or move the long tests to the longtests/ folder and subscribe to the 
Long Tests builds:

   https://bioconductor.org/developers/how-to/long-tests/

You'll be only able to do so once your package is accepted though so it 
doesn't really help in the context of the package review.

H.
Mike

On Tue, 23 Mar 2021 at 12:11, Murphy, Alan E <a.murphy at imperial.ac.uk>
wrote:

Hi,

Thank you very much Martin and Herv? for your suggestions. I have reverted
my zzz.R on load function to that advised by ExperimentHub and had used the
ID look up (system.time(tt_alzh <- eh[["EH5373"]])) on internal functions
and unit tests. However, the check is still taking ~18 minutes so I need to
do a bit more work. Even with my new on load function, calling datasets by
name still takes substantially longer, see below for the example Herv? gave
on my new code:

a<-function(){
   eh <- query(ExperimentHub(), "ewceData")
   tt_alzh <- eh[["EH5373"]]
}
microbenchmark::microbenchmark(a,
                                tt_alzh <- ewceData::tt_alzh(),
                                times=20L,unit="s")
Unit: seconds
expr                                         min          lq
  mean      median          uq         max neval
a                                              0.00000003 0.000000031
0.0000002995 0.000000045 0.000000684 0.000001064    20
t>t_alzh <- ewceData::tt_alzh() 2.71135788 2.755388420 2.9922968274
2.993737666 3.144241330 3.842422679    20

My question is would it be acceptable to change my data load calls in my
examples and the vignette to reduce the runtime or is this against best
practice and should I look for improvements elsewhere? I ask because I feel
I'm running out of easy options at reducing the overall runtime.

Kind regards,
Alan.

________________________________
From: Martin Morgan <mtmorgan.bioc at gmail.com>
Sent: 22 March 2021 18:17
To: Kern, Lori <Lori.Shepherd at RoswellPark.org>; Murphy, Alan E <
a.murphy at imperial.ac.uk>; bioc-devel at r-project.org <
bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] Methods to speed up R CMD Check

(sticking bioc-devel back in the recipient list so others can learn /
improve / disagree with this suggestion.)

my suggestion was to memorize the function in your package, not in the
example. Examples are not run independently, but collated into a single
file (EWCR-Ex.R in the EWCR.Rcheck directory, after running R CMD check)
and sourced. And the suggestion was not to solve the problem of examples
running slowly, but avoiding repeatedly calculating the same value. For
instance, from Herv??s email ewceData::tt_alzh could be memorized in the
package. The first call would take several seconds, but subsequent calls
would be instantaneous. But as Herv? says that function should be cleaned
up anyway so that 'tricks' like memorization might not be necessary.

From: "Murphy, Alan E" <a.murphy at imperial.ac.uk>
Date: Monday, March 22, 2021 at 12:37 PM
To: Martin Morgan <mtmorgan.bioc at gmail.com>
Subject: Re: [Bioc-devel] Methods to speed up R CMD Check

Hey Martin,

Thanks for the suggestion but how would I go about using this, let's say,
for the examples? If I redefine the memoise function in each example (as it
won't otherwise exist) would this not take the same amount of time?

Kind regards,
Alan.

From: Martin Morgan <mtmorgan.bioc at gmail.com>
Sent: 22 March 2021 13:34
To: Kern, Lori <Lori.Shepherd at RoswellPark.org>; Murphy, Alan E <
a.murphy at imperial.ac.uk>; bioc-devel at r-project.org <
bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] Methods to speed up R CMD Check

*******************
This email originates from outside Imperial. Do not click on links and
attachments unless you recognise the sender.
If you trust the sender, add them to your safe senders list
https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping
for this address.
*******************
if your examples repeatedly calculate the same thing, and this is also
typical of how users use your package, it might make sense to 'memoise' key
functions in your package https://cran.r-project.org/package=memoise

Martin

On 3/22/21, 7:41 AM, "Bioc-devel on behalf of Kern, Lori" <
bioc-devel-bounces at r-project.org on behalf of
Lori.Shepherd at RoswellPark.org> wrote:

     If your data is using ExperimentHub,  it should already be caching the
downloaded data.  Once it is downloaded once, it should be using the cached
download for subsequent calls to the hub.  We will investigate to ensure
that the caching mechanism is functioning properly on all of our
Bioconductor builders.

     Lori Shepherd

     Bioconductor Core Team

     Roswell Park Comprehensive Cancer Center

     Department of Biostatistics & Bioinformatics

     Elm & Carlton Streets

     Buffalo, New York 14263

     ________________________________
     From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of
Murphy, Alan E <a.murphy at imperial.ac.uk>
     Sent: Monday, March 22, 2021 5:38 AM
     To: bioc-devel at r-project.org <bioc-devel at r-project.org>
     Subject: [Bioc-devel] Methods to speed up R CMD Check

     Hi all,

     I am working on the development of [EWCE](
https://secure-web.cisco.com/1uG0LGgCjdg85VowwaeRHk2fMjXFkOtQWsgL8p2MQD2j2PZFh_tqvJWaCHJfArA8O4B2WLG1JOwn31NISgSrPW3syUdiPlWNi7cHAMCWKZUQ8d9RrlR-d81LDXXx0xtfCI5ZjjTyFS2xxM2tDea27Y51bWk4Y7jpSnC8Bx768AHBeaJAg3YAK_HTxR6hMzFW99X6Pg8bETgPYi92ccneqdgAJcDBIdfwZnd9OMaM4JS0kY9kYT3F58ho2jM_k0n6EqMzhuXl3HEM7uneL7twMxTTxSZ-vFC1U1eFSkAr0sp38AyD3g6gTbf-vUbghaGV-JBKoybZto3ZDmHhs8OE6cQ/https%3A%2F%2Fgithub.com%2FNathanSkene%2FEWCE)
but have hit an issue with R CMD check's runtime. I have been informed this
test needs to be completed in 15 minutes but mine is currently running in
~24 minutes and I am looking for methods to speed this up. The main
culprits for the runtime issue are:

     checking examples (5m 49.8s)
     Running ?testthat.R? [308s/469s] (7m 49.1s)
     checking for unstated dependencies in vignettes (7m 49.4s)
     checking re-building of vignette outputs (5m 12s)

     With the exception of using smaller datasets which I will consider
myself, is there known ways of speeding these up? EWCE derives data from an
Experimenthub package [ewceData](
https://secure-web.cisco.com/1r4B8NJkUGCpdQsdBW8RWLwGvwEA9TlvXY7VUYgAKS-TBmT7s-6a3zMLfS6rXRVUUxG4x8SCYzXUXZKYMtZ_ysyEzk56tVxfvju-9mo6l11KLQ7CzEpFMikVqdyT25f0G3SQK5u9b0_5JK2gNhR4l0j_5_b_B-uPxzyFF0jtLCZFHKW2-pD7e2P4RVOfbgRALwBXM-hQvhcoaxxrR8tWz3JLjKxWqNIhTrsJdATsAnUO0EnQ5U8JNXClmS9LvWwyTf-0ZqokYXTkjdfYDUAm6KiAGNJo4oX99GUBQZllyiIDprF07KeqjsMNMg4dbmMh0t6jl-UEiUaV3j1xRG8UyyA/https%3A%2F%2Fgithub.com%2Fneurogenomics%2FewceData)
for its examples, tests and vignette. This is run repeatedly and I have
noted this takes a significant amount of time to load a dataset. Is there
anyway of caching the datasets for all the checks or more generally of
speeding this up?

     I have heard of the use of [long tests](
http://secure-web.cisco.com/1yfwFXFFfUKBuFTwUeuS8XGYbh53YduG9ZGKMVmVU9Yrgxg4DbKA0_prEIOCNcgc8uANWYzUw115x_8njawa33mjqM5ZBEvTPTJhmXRzttl1eaRVu3Pa0FTA-d-wPRK3Xxa4miiXob79k_exN0isifYlHPTK7WRxh9_LbFye17PwVVOGsfxjEFKi8WF27D6LWJynf8k-L7iEqB2MSDkf_1zWmfA2qJByna147_Jkaa-nLx9FFl4VhsosBoNDE_qnC939XrCLLCT7RgV0jPukrVdahccxXfT6bgtGBR8ZKfj25BoCeE1_hTJXFgGP0CGmegMYqqmsbd3pGTbo63vTW-A/http://bioconductor.org/developers/how-to/long-tests/)
which aren't run daily by Bioconductor but are these still checked in R CMD
Check? Is there any other way to exclude my tests from the R CMD Check
given they aren't a necessity from Bioconductor?

     Does checking for unstated dependencies in vignettes have a long
runtime based on the number of package dependencies? If I just export
specific functions from packages will this check time reduce?

     Lastly, is there any way to get an exception of the 15 minute maximum?
I may be ill-informed but is the max time for packages on Bioconductor's
daily check 40 minutes which my code in its current state would complete by.

     Kind regards,
     Alan.

             [[alternative HTML version deleted]]

     This email message may contain legally privileged and/or confidential
information.  If you are not the intended recipient(s), or the employee or
agent responsible for the delivery of this message to the intended
recipient(s), you are hereby notified that any disclosure, copying,
distribution, or use of this email message is prohibited.  If you have
received this message in error, please notify the sender immediately by
e-mail and delete this email message from your computer. Thank you.
          [[alternative HTML version deleted]]

     _______________________________________________
     Bioc-devel at r-project.org mailing list
     https://stat.ethz.ch/mailman/listinfo/bioc-devel

         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

	[[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Herv? Pag?s

Bioconductor Core Team
hpages.on.github at gmail.com

[Bioc-devel] Methods to speed up R CMD Check

Thread (15 messages)