Message-ID: <aed62565-261a-777c-7e7b-838eba2a48f0@gmail.com>
Date: 2021-03-23T19:08:12Z
From: Hervé Pagès
Subject: [Bioc-devel] Methods to speed up R CMD Check
In-Reply-To: <DB9PR06MB7369529EE2A31ACCBFDED54AA5649@DB9PR06MB7369.eurprd06.prod.outlook.com>
Doesn't really matter where you put them but
.my_internal_global_variables, .get_eh(), .set_eh(), and toto() don't
need to be defined inside the .onLoad() hook, and it's actually
cleaner/easier to not define them there. You can define there anywhere
in your ewceData/R/* files.
Here is a slightly improved version of the code:
.my_internal_global_variables <- new.env(parent=emptyenv())
.get_eh <- function() get("eh", envir=.my_internal_global_variables)
.set_eh <- function(value) assign("eh", value,
envir=.my_internal_global_variables)
get_ExperimentHub <- function()
{
eh <- try(.get_eh(), silent=TRUE)
if (inherits(eh, "try-error")) {
eh <- ExperimentHub::ExperimentHub()
.set_eh(eh)
}
eh
}
In my packages I like to put this kind of low-level stuff in R/utils.R.
None of this is exported.
Then anywhere in your package when you need an ExperimentHub instance, do:
## Exported.
tt_alzh <- function()
{
eh <- get_ExperimentHub()
eh[["EH5373"]]
}
H.
On 3/23/21 10:53 AM, Murphy, Alan E wrote:
> HeyHerv?,
>
> Thanks for this it is very helpful and I'm very sorry but I have one
> more question, where to put option 3? I thought making an onload r
> script for it:
>
> .onLoad <- function(libname, pkgname) {
> ? .my_internal_global_variables <- new.env(parent=emptyenv())
> ? .get_eh <- function() get("eh", envir=.my_internal_global_variables)
> ? .set_eh <- function(value) assign("eh", value,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? envir=.my_internal_global_variables)
> ? toto <- function()
> ? {
> ? ? eh <- try(.get_eh(), silent=TRUE)
> ? ? if (inherits(eh, "try-error")) {
> ? ? ? eh <- ExperimentHub()
> ? ? ? .set_eh(eh)
> ? ? }
> ? ? eh
> ? }
> ? toto()
> }
>
> This seems to work in that the script runs (I can tell based on the
> output with devtools::check()) but I still get an error that eh doesn't
> exist in my test functions.
>
> Kind regards,
> Alan.
> ------------------------------------------------------------------------
> *From:* Herv? Pag?s <hpages.on.github at gmail.com>
> *Sent:* 23 March 2021 17:31
> *To:* Murphy, Alan E <a.murphy at imperial.ac.uk>; Martin Morgan
> <mtmorgan.bioc at gmail.com>; Kern, Lori <Lori.Shepherd at RoswellPark.org>;
> bioc-devel at r-project.org <bioc-devel at r-project.org>
> *Subject:* Re: [Bioc-devel] Methods to speed up R CMD Check
> 3 ways to do this, one that doesn't work, and two that work ;-)
>
>
> 1. Simple way that doesn't work:
>
> ?? ## Just a place holder. Will be initialized at run-time the first
> ?? ## time it's needed.
> ?? .some_internal_global_variable <- NULL
>
> ?? toto <- function()
> ?? {
> ???? if (is.null(.some_global_variable))
> ???????? .some_internal_global_variable <<- 55L
> ?? }
>
> ?? However, if you put this in your package, you'll get the following
> ?? error the first time toto() is called:
>
> ???? cannot change value of locked binding for
> '.some_internal_global_variable'
>
> 2. Simple way that works: initialize the global variable in the
> ??? .onLoad() hook. This works because the .onLoad() hook is executed
> ??? right before the package namespace gets locked.
>
> ?? ## Just a place holder. Will be initialized at load-time.
> ?? .some_internal_global_variable <- NULL
>
> ?? .onLoad <- function(libname, pkgname)
> ?? {
> ???? .some_internal_global_variable <<- 55L
> ?? }
>
> ?? However, I don't really like using this approach when initialization
> ?? of the global variable requires access to the internet. It means that
> ?? in case of connectivity issue your users won't be able to load the
> ?? package and troubleshooting can become really tricky when you can't
> ?? even load a package. So in that case I prefer the solution below.
>
> 3. Define the internal global variable in an environment:
>
> ?? .my_internal_global_variables <- new.env(parent=emptyenv())
>
> ?? .get_eh <- function() get("eh", envir=.my_internal_global_variables)
> ?? .set_eh <- function(value) assign("eh", value,
> envir=.my_internal_global_variables)
>
> ?? toto <- function()
> ?? {
> ???? eh <- try(.get_eh(), silent=TRUE)
> ???? if (inherits(eh, "try-error")) {
> ?????? eh <- ExperimentHub()
> ?????? .set_eh(eh)
> ???? }
> ???? eh
> ?? }
>
> Hope this helps,
>
> H.
>
>
>
>
> On 3/23/21 10:05 AM, Murphy, Alan E wrote:
>> Hey Herv?,
>>
>> I get the idea now thanks for clarifying. Where do I place the call to
>> ExperimentHub in the package?:
>>
>> eh <- ExperimentHub()? # the only call to ExperimentHub() in
>>? ????????????????????????? # the entire R session
>>
>> The package contains calls to the datasets in internal functions,
>> examples, tests and the vignette so eh it would need to be available to
>> all. Sorry I don't have much experience using experiment datasets.
>>
>> Kind regards,
>> Alan.
>>
>> ------------------------------------------------------------------------
>> *From:* Herv? Pag?s <hpages.on.github at gmail.com>
>> *Sent:* 23 March 2021 16:46
>> *To:* Murphy, Alan E <a.murphy at imperial.ac.uk>; Martin Morgan
>> <mtmorgan.bioc at gmail.com>; Kern, Lori <Lori.Shepherd at RoswellPark.org>;
>> bioc-devel at r-project.org <bioc-devel at r-project.org>
>> *Subject:* Re: [Bioc-devel] Methods to speed up R CMD Check
>>
>> *******************
>> This email originates from outside Imperial. Do not click on links and
>> attachments unless you recognise the sender.
>> If you trust the sender, add them to your safe senders list
>> https://spam.ic.ac.uk/SpamConsole/Senders.aspx
> <https://spam.ic.ac.uk/SpamConsole/Senders.aspx>
>> <https://spam.ic.ac.uk/SpamConsole/Senders.aspx
> <https://spam.ic.ac.uk/SpamConsole/Senders.aspx>> to disable email
>> stamping for this address.
>> *******************
>> On 3/23/21 4:11 AM, Murphy, Alan E wrote:
>>> Hi,
>>>
>>> Thank you very much Martin and Herv? for your suggestions. I have reverted my zzz.R on load function to that advised by ExperimentHub and had used the ID look up (system.time(tt_alzh <- eh[["EH5373"]])) on internal functions and unit tests. However, the check? is still taking ~18 minutes so I need to do a bit more work.
> Even with
>> my new on load function, calling datasets by name still takes
>> substantially longer, see below for the example Herv? gave on my new code:
>>>
>>> a<-function(){
>>>??? eh <- query(ExperimentHub(), "ewceData")
>>
>> The above line is not needed. Creating an ExperimentHub instance can be
>> quite expensive and with the current approach 'R CMD check' will do it
>> dozens of times. My suggestion was to create an ExperimentHub instance
>> once for all the first time you need it, and reuse it in all your data
>> access functions:
>>
>>? ?? eh <- ExperimentHub()? # the only call to ExperimentHub() in
>>? ????????????????????????? # the entire R session
>>
>> Also there's no need to query(). Just use the EHb ID directly on the
>> ExperimentHub instance to load your data:
>>
>>? ?? eh[["EH5373"]]
>>
>> This should reduce 'R CMD check' by a few more minutes.
>>
>> H.
>>
>> --
>> Herv? Pag?s
>>
>> Bioconductor Core Team
>> hpages.on.github at gmail.com
>
> --
> Herv? Pag?s
>
> Bioconductor Core Team
> hpages.on.github at gmail.com
--
Herv? Pag?s
Bioconductor Core Team
hpages.on.github at gmail.com