Dear all, I am developing a package which is a front for various online data (icd.data https://github.com/jackwasey/icd.data/ ). The current CRAN version just has lazy-loaded data, but now the package encompasses far more current and historic ICD codes from different countries, these can't be included in the CRAN package even with maximal compression. Other authors have solved this using functions to get the data, with or without a local cache of the retrieved data. No CRAN or other packages I have found after extensive searching use the attractive active binding feature of R. The goal is simple: for the user to refer to the data by its symbol, e.g., 'icd10fr2019', or 'icd.data::icd10fr2019', and it will be downloaded and parsed transparently (if the user has already granted permission, or after prompt if they haven't). The bindings are set using commands alongside the function definitions in R/*.R .E.g. makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding, environment()) lockBinding("icd10cm_latest", environment()) For non-interactive use, CI and CRAN tests, no data should be downloaded, and no cache directory set up without user consent. For interactive use, I ask permission to create a local data cache before downloading data. This works fine... until R CMD check. The following steps seems to 'get' or 'source' everything from the package namespace, which results in triggering the active bindings, and this fails if I am unable to get consent to download data, and want to 'stop' on this error condition. - checking dependencies in R code - checking S3 generic/method consistency - checking foreign function calls - checking R code for possible problems Debugging CI-specific binding bugs is a nightmare because these occur in different R sessions initiated by R CMD check. There may be legitimate reasons to evaluate everything in the namespace, but I've no idea what they are. Incidentally, Rstudio also does 'mget' on the whole package namespace and triggers bindings during autocomplete. https://github.com/rstudio/rstudio/issues/4414 Is this something I should raise as an issue with R? Or does anyone have any idea of a sensible approach to this. Currently I have a set of workarounds, but this complicates the code, and has taken an awful lot of time. Does anyone know of any CRAN package which has active bindings in the package namespace? Any ideas appreciated. Jack Wasey
[R-pkg-devel] active bindings in package namespace
9 messages · Gábor Csárdi, Kirill Müller, Jack Wasey +2 more
Hi, yet another workaround is to create the active binding in the .onLoad() function. Here is an example from the cli package: https://github.com/r-lib/cli/blob/d4756c483f69c2382c27b0b983d0ce7cc7e63763/R/onload.R#L8 Gabor
On Sat, Mar 23, 2019 at 3:50 PM Jack O. Wasey <jack at jackwasey.com> wrote:
Dear all, I am developing a package which is a front for various online data (icd.data https://github.com/jackwasey/icd.data/ ). The current CRAN version just has lazy-loaded data, but now the package encompasses far more current and historic ICD codes from different countries, these can't be included in the CRAN package even with maximal compression. Other authors have solved this using functions to get the data, with or without a local cache of the retrieved data. No CRAN or other packages I have found after extensive searching use the attractive active binding feature of R. The goal is simple: for the user to refer to the data by its symbol, e.g., 'icd10fr2019', or 'icd.data::icd10fr2019', and it will be downloaded and parsed transparently (if the user has already granted permission, or after prompt if they haven't). The bindings are set using commands alongside the function definitions in R/*.R .E.g. makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding, environment()) lockBinding("icd10cm_latest", environment()) For non-interactive use, CI and CRAN tests, no data should be downloaded, and no cache directory set up without user consent. For interactive use, I ask permission to create a local data cache before downloading data. This works fine... until R CMD check. The following steps seems to 'get' or 'source' everything from the package namespace, which results in triggering the active bindings, and this fails if I am unable to get consent to download data, and want to 'stop' on this error condition. - checking dependencies in R code - checking S3 generic/method consistency - checking foreign function calls - checking R code for possible problems Debugging CI-specific binding bugs is a nightmare because these occur in different R sessions initiated by R CMD check. There may be legitimate reasons to evaluate everything in the namespace, but I've no idea what they are. Incidentally, Rstudio also does 'mget' on the whole package namespace and triggers bindings during autocomplete. https://github.com/rstudio/rstudio/issues/4414 Is this something I should raise as an issue with R? Or does anyone have any idea of a sensible approach to this. Currently I have a set of workarounds, but this complicates the code, and has taken an awful lot of time. Does anyone know of any CRAN package which has active bindings in the package namespace? Any ideas appreciated. Jack Wasey
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Thanks for that thought. I do also do this in that package, and the same problem exists, since the code in those particular steps of R CMD check also calls .onLoad. These steps might even attach the package: the R CMD check code in tools is too convoluted for me to unravel, but the result is the same: active bindings defined directly in the namespace, or inserted into it from .onLoad, get run. .onAttach isn't possible because the namespace is sealed already. In the example you offer from CLI, the binding is inserted into the "dummy" function's immediately enclosing environment, which is AFAIK not the same as the package namespace. Jack
On 3/23/19 2:14 PM, G?bor Cs?rdi wrote:
Hi, yet another workaround is to create the active binding in the .onLoad() function. Here is an example from the cli package: https://github.com/r-lib/cli/blob/d4756c483f69c2382c27b0b983d0ce7cc7e63763/R/onload.R#L8 Gabor On Sat, Mar 23, 2019 at 3:50 PM Jack O. Wasey <jack at jackwasey.com> wrote:
Dear all, I am developing a package which is a front for various online data (icd.data https://github.com/jackwasey/icd.data/ ). The current CRAN version just has lazy-loaded data, but now the package encompasses far more current and historic ICD codes from different countries, these can't be included in the CRAN package even with maximal compression. Other authors have solved this using functions to get the data, with or without a local cache of the retrieved data. No CRAN or other packages I have found after extensive searching use the attractive active binding feature of R. The goal is simple: for the user to refer to the data by its symbol, e.g., 'icd10fr2019', or 'icd.data::icd10fr2019', and it will be downloaded and parsed transparently (if the user has already granted permission, or after prompt if they haven't). The bindings are set using commands alongside the function definitions in R/*.R .E.g. makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding, environment()) lockBinding("icd10cm_latest", environment()) For non-interactive use, CI and CRAN tests, no data should be downloaded, and no cache directory set up without user consent. For interactive use, I ask permission to create a local data cache before downloading data. This works fine... until R CMD check. The following steps seems to 'get' or 'source' everything from the package namespace, which results in triggering the active bindings, and this fails if I am unable to get consent to download data, and want to 'stop' on this error condition. - checking dependencies in R code - checking S3 generic/method consistency - checking foreign function calls - checking R code for possible problems Debugging CI-specific binding bugs is a nightmare because these occur in different R sessions initiated by R CMD check. There may be legitimate reasons to evaluate everything in the namespace, but I've no idea what they are. Incidentally, Rstudio also does 'mget' on the whole package namespace and triggers bindings during autocomplete. https://github.com/rstudio/rstudio/issues/4414 Is this something I should raise as an issue with R? Or does anyone have any idea of a sensible approach to this. Currently I have a set of workarounds, but this complicates the code, and has taken an awful lot of time. Does anyone know of any CRAN package which has active bindings in the package namespace? Any ideas appreciated. Jack Wasey
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
On Sat, Mar 23, 2019 at 9:46 PM Jack Wasey <jack at jackwasey.com> wrote:
Thanks for that thought. I do also do this in that package, and the same problem exists, since the code in those particular steps of R CMD check also calls .onLoad. These steps might even attach the package: the R CMD check code in tools is too convoluted for me to unravel, but the result is the same: active bindings defined directly in the namespace, or inserted into it from .onLoad, get run. .onAttach isn't possible because the namespace is sealed already.
cli is on CRAN and checks OK with R CMD check.
In the example you offer from CLI, the binding is inserted into the "dummy" function's immediately enclosing environment, which is AFAIK not the same as the package namespace.
It seems to me that it is the same, no? ? environment(cli:::dummy) <environment: namespace:cli> ? "symbol" %in% ls(environment(cli:::dummy)) [1] TRUE G.
Jack On 3/23/19 2:14 PM, G?bor Cs?rdi wrote:
Hi, yet another workaround is to create the active binding in the .onLoad() function. Here is an example from the cli package: https://github.com/r-lib/cli/blob/d4756c483f69c2382c27b0b983d0ce7cc7e63763/R/onload.R#L8 Gabor On Sat, Mar 23, 2019 at 3:50 PM Jack O. Wasey <jack at jackwasey.com> wrote:
Dear all, I am developing a package which is a front for various online data (icd.data https://github.com/jackwasey/icd.data/ ). The current CRAN version just has lazy-loaded data, but now the package encompasses far more current and historic ICD codes from different countries, these can't be included in the CRAN package even with maximal compression. Other authors have solved this using functions to get the data, with or without a local cache of the retrieved data. No CRAN or other packages I have found after extensive searching use the attractive active binding feature of R. The goal is simple: for the user to refer to the data by its symbol, e.g., 'icd10fr2019', or 'icd.data::icd10fr2019', and it will be downloaded and parsed transparently (if the user has already granted permission, or after prompt if they haven't). The bindings are set using commands alongside the function definitions in R/*.R .E.g. makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding, environment()) lockBinding("icd10cm_latest", environment()) For non-interactive use, CI and CRAN tests, no data should be downloaded, and no cache directory set up without user consent. For interactive use, I ask permission to create a local data cache before downloading data. This works fine... until R CMD check. The following steps seems to 'get' or 'source' everything from the package namespace, which results in triggering the active bindings, and this fails if I am unable to get consent to download data, and want to 'stop' on this error condition. - checking dependencies in R code - checking S3 generic/method consistency - checking foreign function calls - checking R code for possible problems Debugging CI-specific binding bugs is a nightmare because these occur in different R sessions initiated by R CMD check. There may be legitimate reasons to evaluate everything in the namespace, but I've no idea what they are. Incidentally, Rstudio also does 'mget' on the whole package namespace and triggers bindings during autocomplete. https://github.com/rstudio/rstudio/issues/4414 Is this something I should raise as an issue with R? Or does anyone have any idea of a sensible approach to this. Currently I have a set of workarounds, but this complicates the code, and has taken an awful lot of time. Does anyone know of any CRAN package which has active bindings in the package namespace? Any ideas appreciated. Jack Wasey
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Dear Jack This doesn't answer your question, but I would advise against this design. - Users do not expect side effects (such as network access) from accessing a symbol. - A function gives you much more flexibility to change the interface later on. (Arguments for fetching the data, tokens for API access, ...) - You already encountered a few quirks that make this an "interesting" problem. A function call only needs a pair of parentheses. Best regards Kirill
On 23.03.19 16:50, Jack O. Wasey wrote:
Dear all, I am developing a package which is a front for various online data (icd.data https://github.com/jackwasey/icd.data/ ). The current CRAN version just has lazy-loaded data, but now the package encompasses far more current and historic ICD codes from different countries, these can't be included in the CRAN package even with maximal compression. Other authors have solved this using functions to get the data, with or without a local cache of the retrieved data. No CRAN or other packages I have found after extensive searching use the attractive active binding feature of R. The goal is simple: for the user to refer to the data by its symbol, e.g., 'icd10fr2019', or 'icd.data::icd10fr2019', and it will be downloaded and parsed transparently (if the user has already granted permission, or after prompt if they haven't). The bindings are set using commands alongside the function definitions in R/*.R .E.g. makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding, environment()) lockBinding("icd10cm_latest", environment()) For non-interactive use, CI and CRAN tests, no data should be downloaded, and no cache directory set up without user consent. For interactive use, I ask permission to create a local data cache before downloading data. This works fine... until R CMD check. The following steps seems to 'get' or 'source' everything from the package namespace, which results in triggering the active bindings, and this fails if I am unable to get consent to download data, and want to 'stop' on this error condition. ?- checking dependencies in R code ?- checking S3 generic/method consistency ?- checking foreign function calls ?- checking R code for possible problems Debugging CI-specific binding bugs is a nightmare because these occur in different R sessions initiated by R CMD check. There may be legitimate reasons to evaluate everything in the namespace, but I've no idea what they are. Incidentally, Rstudio also does 'mget' on the whole package namespace and triggers bindings during autocomplete. https://github.com/rstudio/rstudio/issues/4414 Is this something I should raise as an issue with R? Or does anyone have any idea of a sensible approach to this. Currently I have a set of workarounds, but this complicates the code, and has taken an awful lot of time. Does anyone know of any CRAN package which has active bindings in the package namespace? Any ideas appreciated. Jack Wasey
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Thanks both, this is helpful advice.
On 3/23/19 5:14 PM, Kirill M?ller wrote:
Dear Jack This doesn't answer your question, but I would advise against this design. - Users do not expect side effects (such as network access) from accessing a symbol. - A function gives you much more flexibility to change the interface later on. (Arguments for fetching the data, tokens for API access, ...) - You already encountered a few quirks that make this an "interesting" problem. A function call only needs a pair of parentheses. Best regards Kirill On 23.03.19 16:50, Jack O. Wasey wrote:
Dear all, I am developing a package which is a front for various online data (icd.data https://github.com/jackwasey/icd.data/ ). The current CRAN version just has lazy-loaded data, but now the package encompasses far more current and historic ICD codes from different countries, these can't be included in the CRAN package even with maximal compression. Other authors have solved this using functions to get the data, with or without a local cache of the retrieved data. No CRAN or other packages I have found after extensive searching use the attractive active binding feature of R. The goal is simple: for the user to refer to the data by its symbol, e.g., 'icd10fr2019', or 'icd.data::icd10fr2019', and it will be downloaded and parsed transparently (if the user has already granted permission, or after prompt if they haven't). The bindings are set using commands alongside the function definitions in R/*.R .E.g. makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding, environment()) lockBinding("icd10cm_latest", environment()) For non-interactive use, CI and CRAN tests, no data should be downloaded, and no cache directory set up without user consent. For interactive use, I ask permission to create a local data cache before downloading data. This works fine... until R CMD check. The following steps seems to 'get' or 'source' everything from the package namespace, which results in triggering the active bindings, and this fails if I am unable to get consent to download data, and want to 'stop' on this error condition. ?- checking dependencies in R code ?- checking S3 generic/method consistency ?- checking foreign function calls ?- checking R code for possible problems Debugging CI-specific binding bugs is a nightmare because these occur in different R sessions initiated by R CMD check. There may be legitimate reasons to evaluate everything in the namespace, but I've no idea what they are. Incidentally, Rstudio also does 'mget' on the whole package namespace and triggers bindings during autocomplete. https://github.com/rstudio/rstudio/issues/4414 Is this something I should raise as an issue with R? Or does anyone have any idea of a sensible approach to this. Currently I have a set of workarounds, but this complicates the code, and has taken an awful lot of time. Does anyone know of any CRAN package which has active bindings in the package namespace? Any ideas appreciated. Jack Wasey
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Don't want to turn this into a pile-on, but I also think this isn't a very good idea. As I understand it, accessing the symbol "foo" will pull the latest version of foo from the remote site. This has consequences for reproducibility, because now your code could be exactly the same, and your local environment exactly the same, and yet running the code at different times can yield different results because the remote data has been updated. -----Original Message----- From: R-package-devel <r-package-devel-bounces at r-project.org> On Behalf Of Jack Wasey Sent: Sunday, 24 March 2019 9:57 AM To: Kirill M?ller <krlmlr+ml at mailbox.org>; R Development <r-package-devel at r-project.org> Subject: Re: [R-pkg-devel] active bindings in package namespace Thanks both, this is helpful advice.
On 3/23/19 5:14 PM, Kirill M?ller wrote:
Dear Jack This doesn't answer your question, but I would advise against this design. - Users do not expect side effects (such as network access) from accessing a symbol. - A function gives you much more flexibility to change the interface later on. (Arguments for fetching the data, tokens for API access, ...) - You already encountered a few quirks that make this an "interesting" problem. A function call only needs a pair of parentheses. Best regards Kirill On 23.03.19 16:50, Jack O. Wasey wrote:
Dear all, I am developing a package which is a front for various online data (icd.data https://github.com/jackwasey/icd.data/ ). The current CRAN version just has lazy-loaded data, but now the package encompasses far more current and historic ICD codes from different countries, these can't be included in the CRAN package even with maximal compression. Other authors have solved this using functions to get the data, with or without a local cache of the retrieved data. No CRAN or other packages I have found after extensive searching use the attractive active binding feature of R. The goal is simple: for the user to refer to the data by its symbol, e.g., 'icd10fr2019', or 'icd.data::icd10fr2019', and it will be downloaded and parsed transparently (if the user has already granted permission, or after prompt if they haven't). The bindings are set using commands alongside the function definitions in R/*.R .E.g. makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding, environment()) lockBinding("icd10cm_latest", environment()) For non-interactive use, CI and CRAN tests, no data should be downloaded, and no cache directory set up without user consent. For interactive use, I ask permission to create a local data cache before downloading data. This works fine... until R CMD check. The following steps seems to 'get' or 'source' everything from the package namespace, which results in triggering the active bindings, and this fails if I am unable to get consent to download data, and want to 'stop' on this error condition. ?- checking dependencies in R code ?- checking S3 generic/method consistency ?- checking foreign function calls ?- checking R code for possible problems Debugging CI-specific binding bugs is a nightmare because these occur in different R sessions initiated by R CMD check. There may be legitimate reasons to evaluate everything in the namespace, but I've no idea what they are. Incidentally, Rstudio also does 'mget' on the whole package namespace and triggers bindings during autocomplete. https://github.com/rstudio/rstudio/issues/4414 Is this something I should raise as an issue with R? Or does anyone have any idea of a sensible approach to this. Currently I have a set of workarounds, but this complicates the code, and has taken an awful lot of time. Does anyone know of any CRAN package which has active bindings in the package namespace? Any ideas appreciated. Jack Wasey
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
This is a good point. I would prefer to include all the data in the package, but CRAN has strict limitations on package and subdirectory size, which the potential data would easily exceed. Whether it is an active binding or a get function, dynamically downloaded data will always suffer this problem. Also, there are potential copyright issues which may prevent including all the relevant data in a package, no matter how the package is distributed. For this particular package of ICD data, the biggest risk is not the data changing, but the data not being made available in the future, or not being provided in a useful format. I do allow the user to set the cache directory, which eventually includes all the raw and processed data, and this could be archived by the user for reproducibilty. In addition, the test suite covers potential changes to the source data.
On 3/24/19 11:21 AM, Hong Ooi wrote:
Don't want to turn this into a pile-on, but I also think this isn't a very good idea. As I understand it, accessing the symbol "foo" will pull the latest version of foo from the remote site. This has consequences for reproducibility, because now your code could be exactly the same, and your local environment exactly the same, and yet running the code at different times can yield different results because the remote data has been updated. -----Original Message----- From: R-package-devel <r-package-devel-bounces at r-project.org> On Behalf Of Jack Wasey Sent: Sunday, 24 March 2019 9:57 AM To: Kirill M?ller <krlmlr+ml at mailbox.org>; R Development <r-package-devel at r-project.org> Subject: Re: [R-pkg-devel] active bindings in package namespace Thanks both, this is helpful advice. On 3/23/19 5:14 PM, Kirill M?ller wrote:
Dear Jack This doesn't answer your question, but I would advise against this design. - Users do not expect side effects (such as network access) from accessing a symbol. - A function gives you much more flexibility to change the interface later on. (Arguments for fetching the data, tokens for API access, ...) - You already encountered a few quirks that make this an "interesting" problem. A function call only needs a pair of parentheses. Best regards Kirill On 23.03.19 16:50, Jack O. Wasey wrote:
Dear all, I am developing a package which is a front for various online data (icd.data https://github.com/jackwasey/icd.data/ ). The current CRAN version just has lazy-loaded data, but now the package encompasses far more current and historic ICD codes from different countries, these can't be included in the CRAN package even with maximal compression. Other authors have solved this using functions to get the data, with or without a local cache of the retrieved data. No CRAN or other packages I have found after extensive searching use the attractive active binding feature of R. The goal is simple: for the user to refer to the data by its symbol, e.g., 'icd10fr2019', or 'icd.data::icd10fr2019', and it will be downloaded and parsed transparently (if the user has already granted permission, or after prompt if they haven't). The bindings are set using commands alongside the function definitions in R/*.R .E.g. makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding, environment()) lockBinding("icd10cm_latest", environment()) For non-interactive use, CI and CRAN tests, no data should be downloaded, and no cache directory set up without user consent. For interactive use, I ask permission to create a local data cache before downloading data. This works fine... until R CMD check. The following steps seems to 'get' or 'source' everything from the package namespace, which results in triggering the active bindings, and this fails if I am unable to get consent to download data, and want to 'stop' on this error condition. ?- checking dependencies in R code ?- checking S3 generic/method consistency ?- checking foreign function calls ?- checking R code for possible problems Debugging CI-specific binding bugs is a nightmare because these occur in different R sessions initiated by R CMD check. There may be legitimate reasons to evaluate everything in the namespace, but I've no idea what they are. Incidentally, Rstudio also does 'mget' on the whole package namespace and triggers bindings during autocomplete. https://github.com/rstudio/rstudio/issues/4414 Is this something I should raise as an issue with R? Or does anyone have any idea of a sensible approach to this. Currently I have a set of workarounds, but this complicates the code, and has taken an awful lot of time. Does anyone know of any CRAN package which has active bindings in the package namespace? Any ideas appreciated. Jack Wasey
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
On 24.03.19 18:27, Jack O. Wasey wrote:
This is a good point. I would prefer to include all the data in the package, but CRAN has strict limitations on package and subdirectory size, which the potential data would easily exceed.
You could use a separate data package distributed via drat, c.f. https://journal.r-project.org/archive/2017/RJ-2017-026/RJ-2017-026.pdf. The original 'hurricaneexposure' package has meanwhile been archived. However, I have successfully used this method in https://cran.r-project.org/package=swephR. Greetings Ralf
Ralf Stubner Senior Software Engineer / Trainer daqana GmbH Dortustra?e 48 14467 Potsdam T: +49 331 23 61 93 11 F: +49 331 23 61 93 90 M: +49 162 20 91 196 Mail: ralf.stubner at daqana.com Sitz: Potsdam Register: AG Potsdam HRB 27966 Ust.-IdNr.: DE300072622 Gesch?ftsf?hrer: Dr.-Ing. Stefan Knirsch, Prof. Dr. Dr. Karl-Kuno Kunze -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: <https://stat.ethz.ch/pipermail/r-package-devel/attachments/20190325/178b571c/attachment.sig>