Skip to content

[R-pkg-devel] active bindings in package namespace

9 messages · Gábor Csárdi, Kirill Müller, Jack Wasey +2 more

#
Dear all,

I am developing a package which is a front for various online data 
(icd.data https://github.com/jackwasey/icd.data/ ). The current CRAN 
version just has lazy-loaded data, but now the package encompasses far 
more current and historic ICD codes from different countries, these 
can't be included in the CRAN package even with maximal compression.

Other authors have solved this using functions to get the data, with or 
without a local cache of the retrieved data. No CRAN or other packages I 
have found after extensive searching use the attractive active binding 
feature of R.

The goal is simple: for the user to refer to the data by its symbol, 
e.g., 'icd10fr2019', or 'icd.data::icd10fr2019', and it will be 
downloaded and parsed transparently (if the user has already granted 
permission, or after prompt if they haven't).

The bindings are set using commands alongside the function definitions 
in R/*.R .E.g.

makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding, environment())
lockBinding("icd10cm_latest", environment())

For non-interactive use, CI and CRAN tests, no data should be 
downloaded, and no cache directory set up without user consent. For 
interactive use, I ask permission to create a local data cache before 
downloading data.

This works fine... until R CMD check. The following steps seems to 'get' 
or 'source' everything from the package namespace, which results in 
triggering the active bindings, and this fails if I am unable to get 
consent to download data, and want to 'stop' on this error condition.
  - checking dependencies in R code
  - checking S3 generic/method consistency
  - checking foreign function calls
  - checking R code for possible problems

Debugging CI-specific binding bugs is a nightmare because these occur in 
different R sessions initiated by R CMD check.

There may be legitimate reasons to evaluate everything in the namespace, 
but I've no idea what they are. Incidentally, Rstudio also does 'mget' 
on the whole package namespace and triggers bindings during 
autocomplete. https://github.com/rstudio/rstudio/issues/4414

Is this something I should raise as an issue with R? Or does anyone have 
any idea of a sensible approach to this. Currently I have a set of 
workarounds, but this complicates the code, and has taken an awful lot 
of time. Does anyone know of any CRAN package which has active bindings 
in the package namespace?

Any ideas appreciated.

Jack Wasey
#
Hi, yet another workaround is to create the active binding in the
.onLoad() function. Here is an example from the cli package:
https://github.com/r-lib/cli/blob/d4756c483f69c2382c27b0b983d0ce7cc7e63763/R/onload.R#L8

Gabor
On Sat, Mar 23, 2019 at 3:50 PM Jack O. Wasey <jack at jackwasey.com> wrote:
#
Thanks for that thought. I do also do this in that package, and the same problem exists, since the code in those particular steps of R CMD check also calls .onLoad. These steps might even attach the package: the R CMD check code in tools is too convoluted for me to unravel, but the result is the same: active bindings defined directly in the namespace, or inserted into it from .onLoad, get run. .onAttach isn't possible because the namespace is sealed already.

In the example you offer from CLI, the binding is inserted into the "dummy" function's immediately enclosing environment, which is AFAIK not the same as the package namespace.

Jack
On 3/23/19 2:14 PM, G?bor Cs?rdi wrote:
#
On Sat, Mar 23, 2019 at 9:46 PM Jack Wasey <jack at jackwasey.com> wrote:
cli is on CRAN and checks OK with R CMD check.
It seems to me that it is the same, no?

? environment(cli:::dummy)
<environment: namespace:cli>

? "symbol" %in% ls(environment(cli:::dummy))
[1] TRUE

G.
#
Dear Jack


This doesn't answer your question, but I would advise against this design.

- Users do not expect side effects (such as network access) from 
accessing a symbol.

- A function gives you much more flexibility to change the interface 
later on. (Arguments for fetching the data, tokens for API access, ...)

- You already encountered a few quirks that make this an "interesting" 
problem.

A function call only needs a pair of parentheses.


Best regards

Kirill
On 23.03.19 16:50, Jack O. Wasey wrote:
#
Thanks both, this is helpful advice.
On 3/23/19 5:14 PM, Kirill M?ller wrote:
#
Don't want to turn this into a pile-on, but I also think this isn't a very good idea. As I understand it, accessing the symbol "foo" will pull the latest version of foo from the remote site. This has consequences for reproducibility, because now your code could be exactly the same, and your local environment exactly the same, and yet running the code at different times can yield different results because the remote data has been updated.


-----Original Message-----
From: R-package-devel <r-package-devel-bounces at r-project.org> On Behalf Of Jack Wasey
Sent: Sunday, 24 March 2019 9:57 AM
To: Kirill M?ller <krlmlr+ml at mailbox.org>; R Development <r-package-devel at r-project.org>
Subject: Re: [R-pkg-devel] active bindings in package namespace

Thanks both, this is helpful advice.
On 3/23/19 5:14 PM, Kirill M?ller wrote:
______________________________________________
R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
#
This is a good point. I would prefer to include all the data in the 
package, but CRAN has strict limitations on package and subdirectory 
size, which the potential data would easily exceed. Whether it is an 
active binding or a get function, dynamically downloaded data will 
always suffer this problem. Also, there are potential copyright issues 
which may prevent including all the relevant data in a package, no 
matter how the package is distributed.

For this particular package of ICD data, the biggest risk is not the 
data changing, but the data not being made available in the future, or 
not being provided in a useful format.

I do allow the user to set the cache directory, which eventually 
includes all the raw and processed data, and this could be archived by 
the user for reproducibilty. In addition, the test suite covers 
potential changes to the source data.
On 3/24/19 11:21 AM, Hong Ooi wrote:
#
On 24.03.19 18:27, Jack O. Wasey wrote:
You could use a separate data package distributed via drat, c.f.
https://journal.r-project.org/archive/2017/RJ-2017-026/RJ-2017-026.pdf.
The original 'hurricaneexposure' package has meanwhile been archived.
However, I have successfully used this method in
https://cran.r-project.org/package=swephR.

Greetings
Ralf