[R-pkg-devel] Package builds, installs, and runs but does not pass devtools::check()
Very nice discussion. Thanks, Mark. On Thu, Jul 19, 2018 at 3:20 AM, Mark van der Loo
<mark.vanderloo at gmail.com> wrote:
Dear Mike, et al, My remarks are not necessarily related to tidyverse packages. The main point is that there are various purposes and business cases for writing code, and they may imply different trade-offs. Let me illustrate with some examples. I will focus on non-standard evaluation and dependencies. TL;DR version: (and this is my opinion, nobody has to agree). 1/Interactive use: user-level NSE ok (as in the not-a-pipe operator, dplyr verbs), use any package you want. 2/Applications & local packages: avoid NSE within functions, package an application with dependencies you need, write code with maintainers in mind. 3/Published R-packages: avoid NSE within functions, minimize dependencies to what you cannot avoid. Do Read version: 1/ One-off data analyses or exploratory data analyses. There are cases where you don't need to guarantee that your code will run a few years from now: you are the only user and once your task is done, you quickly need to move on to the next. Especially in EDA, I write a lot of code that is nice to keep in a structured project folder but most probably: 1) I will be its only user and 2) I will use it only for this one small project so maintenance is not an issue. Although I'm writing code in scripts, it is very close to interactive work on the command-line. In such cases I use whatever gets the job done, including dplyr, tidyr, ggplot2, data.table, you name it. Here I basically don't care about dependencies and if I write functions there are usually not many of them. 2/ Writing applications or packages for internal use. When you write an application you are usually committing to a longer maintenance horizon and more than one user. Good chance that you're not the user and also good chance you're not the only developer. There are many implications to this but since you need to maintain things for a longer term, dependencies can become a liability. Fortunately, there are techniques to contain dependencies, for example using packrat or by manually setting up a library containing the packages your application depends on. You can even use a docker instance. I have worked with custom libraries on several occasions. Since you (or someone else) is going to maintain the application, it is worth while to sit down and think what is the best way to set up code so it remains maintainable. This includes questions like: can I easily understand what happens when reading it? What expertise does the maintainer need to understand it? Non-standard evaluation is generally much harder to reason about than standard evaluated code. This makes debugging and extending code harder in general. Now some people will argue that something like filter(data, x>1) is easier to understand than data[data$x > 1,,drop=FALSE]. I agree that on a very shallow level, filter(data, x>1) is easy to follow, in the sense of "oh the author probably wants to filter something here". But when you are debugging, you need to understand in much greater detail what happens: you need to know that 'x>1' is an expression, that will be evaluated in the context of 'data'. You need to know about environments and parent environments and so on. All this knowledge can be avoided with data[data$x > 1,,drop=FALSE]. The latter also requires knowledge, but the concepts are much simple I think. Hence, I tend to avoid NSE when writing applications, although there may still be good reasons to do it. Dependencies can be containered in various ways so they are not such a big problem. 3/ Writing packages for CRAN. Now you are committing to long-term maintenance, and usage by interactive users, application builders, and possibly other package builders. Now a dependency becomes a direct liability in the sense that the author of your dependency can change interfaces and ask you to comply to the new version. Also, and especially because of recursive dependencies, importing a package may give you a whole tail of dependencies. This increases load time but also install-time, especially on systems where you need to install from source. Light-weight packages therefore have real advantages in applications that run many times (like a standalone script that is fired by users of a web-application or scripts that are scheduled to run in high frequency). It is also worth mentioning that an Imports or Depends puts a burden on the maintainer of the package you depend on: before submitting to CRAN, a pkg developer needs to check against all reverse dependencies (preferably recursively). So now, it is even more worth while to sit down and think about what is the best way to set up your code. Well thought out code can be a pleasure to maintain. Code that is hastily put together is a nightmare. My philosophy is as follows: I depend other packages only when they offer something that I cannot fairly trivially do myself. This may have to do with a statistical or numerical method I do not want or cannot implement, or it can have something to do with performance for example. This does indeed exclude much of the tidyverse almost automatically. Many tools in tidyverse make already existing functionality easier for (interactive) use. But since much of the functionality is already present in base R, and because I find NSE hard to reason about in a programming context I have until now not used any tidyverse packages as an Imports or Depends. Hope this helps, Best, Mark Op di 17 jul. 2018 om 23:10 schreef Michael Hannon <jmhannon.ucdavis at gmail.com>:
Thanks, Mark. Your points are well-taken, but I wouldn't refer to
this as a "small side-track". You don't say so, but this could be
interpreted as a recommendation to avoid some or all of the
"tidyverse" in developing packages. I'm actually quite comfortable
doing the base-R-style programming you recommend. I've lately being
trying to make a point of using the "tidy" stuff, as that's what I'm
seeing almost exclusively from folks in my neighborhood these days.
("Resistance is few-tile...")
Also, it would seem to be a corollary that if the ultimate goal is to
make a package, then one shouldn't be using the convenience stuff
(pipes, dplyr, etc., etc.), even during the development stages. Can
you comment? Thanks.
-- Mike
On Tue, Jul 17, 2018 at 2:53 AM, Mark van der Loo
<mark.vanderloo at gmail.com> wrote:
Michael, Just a small side-track here. I would avoid using the not-a-pipe operator within functions or packages in general. It is great for interactive use, but it does make debugging and hence long-term maintenance of functions harder. There are two reasons for this. First, it hides intermediate results, and second, it adds several layers to the call stack making the output of functions like traceback() harder to interpret. I have documented a simple example here: https://github.com/chriscardillo/norris/issues/1 (scroll down a bit). Regarding learning about quosures and so on. If the literal names of data frames are known, you could consider replacing some_var <- next_data_frame %>% dplyr::select(-amount,... with something simpler like some_var <- next_data_frame[ names(next_data_frame) != c("amount", ... ) ] which might also save you some dependencies. Hope this helps, Best, Mark Op di 17 jul. 2018 om 11:28 schreef Michael Hannon <jmhannon.ucdavis at gmail.com>:
Thanks to John and Zhian for their recent and informative comments. Regarding check() and NSE: the moral seems to be that a little learning is a dangerous thing. I'm off to try to bring quosure to this issue. -- Mike On Mon, Jul 16, 2018 at 2:38 PM, Zhian Kamvar <zkamvar at gmail.com> wrote:
Using dplyr like that is for exploratory data analysis. You'll want to refer to dplyr's "Programming with dplyr" vignette for using dplyr in a package: https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html Hope that helps. On Jul 16, 2018, at 22:13 , Michael Hannon <jmhannon.ucdavis at gmail.com> wrote: Thanks, Georgi. I've changed my approach and now do what I gather is recommended practice: put all external package names into the "Imports" section of the DESCRIPTION file and then use the fully-qualified names for functions from those packages, as: dplyr::select() The "check" operation is still not entirely "happy" with me, but it doesn't flag any errors, and the package builds and runs. BTW, one source of "complaints" from "check()" is evidently the use of NSE in the tidyverse functions. For instance, the line: next_data_frame %>% dplyr::select(-amount, generates the message: standardize_format: no visible binding for global variable ?amount? where, of course, "amount" is one of the column headings in "next_data_frame". There seems to be no harm done by this, and I plan to ignore such messages, but if there's some additional wisdom that applies here, I'd be happy to receive it. -- Mike On Sun, Jul 15, 2018 at 12:05 AM, Georgi Boshnakov <georgi.boshnakov at manchester.ac.uk> wrote: It seems that the R session used by 'check' doesn't look in the library used by your interactive session. This discrepancy may happen since the check tools do not load the same Renviron files as interactive sessions. This may result in different libraries in interactive and 'check' sessions. See ?Startup, especially section Note. It is difficult to give more specific advice without details of your setup. Hope this helps, Georgi Boshnakov
________________________________________
From: R-package-devel [r-package-devel-bounces at r-project.org] on
behalf
of
Michael Hannon [jmhannon.ucdavis at gmail.com]
Sent: 15 July 2018 02:13
To: r-package-devel at r-project.org
Subject: [R-pkg-devel] Package builds, installs, and runs but does
not
pass
devtools::check()
Greetings. I'm working on a small package, and I'm using the
devtools
functions to create, build, etc., the package.
As indicated in the subject line, I get no errors when I do:
build()
install()
When I run a separate R session and load the package, i.e.,
library(my_pkg)
the package loads without error, and the two exported functions
appear
to work as advertised.
OTOH, if I include devtools::check() in the construction of the
package, I consistently get an error:
* installing *source* package ?my_pkg? ...
** R
** preparing package for lazy loading
Error in loadNamespace(from, lib.loc = .library) :
there is no package called ?dplyr?
Error : unable to load R code in package 'my_pkg'
Clearly there *is* a package called "dplyr" on my system (see the
session info below, for instance). And, as I've mentioned, the code
*does* run, and I can watch it successfully reading CSV files.
Here's the relevant part of my DESCRIPTION file:
Depends: R (>= 3.4.4)
Imports: readr,
dplyr,
ggplot2,
purrr,
magrittr
I suspect the problem may be that I'm misunderstanding something
about
the `import::from()` function, which I'm using for the first time to
load required functions into my code. In each of the three files
that
use dplyr I have the line:
import::from(dplyr, mutate, filter, rename, select, setdiff,
slice,
"%>%")
I've tried:
(1) putting that line in just one of the files (the lexically
first
one)
(2) including different subsets of dplyr functions, as needed, in
the various files
Needless to say, I haven't seen any improvement with any of the above
(or any of the other thrashing I've done).
If you can point me in the right direction, I'd appreciate it.
Thanks.
-- Mike
session_info()
Session info
------------------------------------------------------------------
setting value
version R version 3.4.4 (2018-03-15)
system x86_64, linux-gnu
ui X11
language en_US
collate en_US.UTF-8
tz America/Los_Angeles
date 2018-07-14
Packages
----------------------------------------------------------------------
package * version date source
assertthat 0.2.0 2017-04-11 CRAN (R 3.3.3)
base * 3.4.4 2018-03-16 local
bindr 0.1.1 2018-03-13 CRAN (R 3.4.3)
bindrcpp 0.2.2 2018-03-29 CRAN (R 3.4.4)
compiler 3.4.4 2018-03-16 local
crayon 1.3.4 2017-09-16 CRAN (R 3.4.1)
datasets * 3.4.4 2018-03-16 local
devtools * 1.13.6 2018-06-27 CRAN (R 3.4.4)
digest 0.6.15 2018-01-28 CRAN (R 3.4.3)
dplyr * 0.7.6 2018-06-29 CRAN (R 3.4.4)
glue 1.2.0 2017-10-29 CRAN (R 3.4.2)
graphics * 3.4.4 2018-03-16 local
grDevices * 3.4.4 2018-03-16 local
magrittr 1.5 2014-11-22 CRAN (R 3.2.2)
memoise 1.1.0 2017-04-21 CRAN (R 3.3.3)
methods * 3.4.4 2018-03-16 local
pillar 1.3.0 2018-07-14 CRAN (R 3.4.4)
pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.0)
purrr 0.2.5 2018-05-29 CRAN (R 3.4.4)
R6 2.2.2 2017-06-17 CRAN (R 3.4.0)
Rcpp 0.12.17 2018-05-18 CRAN (R 3.4.4)
rlang 0.2.1 2018-05-30 CRAN (R 3.4.4)
stats * 3.4.4 2018-03-16 local
tibble 1.4.2 2018-01-22 CRAN (R 3.4.3)
tidyselect 0.2.4 2018-02-26 CRAN (R 3.4.3)
utils * 3.4.4 2018-03-16 local
withr 2.1.2 2018-03-15 CRAN (R 3.4.3)
______________________________________________
R-package-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
______________________________________________
R-package-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel