Skip to content

conflicted: an alternative conflict resolution strategy

10 messages · Duncan Murdoch, Jari Oksanen, Joris Meys +2 more

#
Hi all,

I?d love to get your feedback on the conflicted package, which provides an
alternative strategy for resolving ambiugous function names (i.e. when
multiple packages provide identically named functions). conflicted 0.1.0
is already on CRAN, but I?m currently preparing a revision
(<https://github.com/r-lib/conflicted>), and looking for feedback.

As you are no doubt aware, R?s default approach means that the most
recently loaded package ?wins? any conflicts. You do get a message about
conflicts on load, but I see a lot newer R users experiencing problems
caused by function conflicts. I think there are three primary reasons:

-   People don?t read messages about conflicts. Even if you are
    conscientious and do read the messages, it?s hard to notice a single
    new conflict caused by a package upgrade.

-   The warning and the problem may be quite far apart. If you load all
    your packages at the top of the script, it may potentially be 100s
    of lines before you encounter a conflict.

-   The error messages caused by conflicts are cryptic because you end
    up calling a function with utterly unexpected arguments.

For these reasons, conflicted takes an alternative approach, forcing the
user to explicitly disambiguate any conflicts:

    library(conflicted)
    library(dplyr)
    library(MASS)

    select
    #> Error: [conflicted] `select` found in 2 packages.
    #> Either pick the one you want with `::`
    #> * MASS::select
    #> * dplyr::select
    #> Or declare a preference with `conflicted_prefer()`
    #> * conflict_prefer("select", "MASS")
    #> * conflict_prefer("select", "dplyr")

conflicted works by attaching a new ?conflicted? environment just after
the global environment. This environment contains an active binding for
any ambiguous bindings. The conflicted environment also contains
bindings for `library()` and `require()` that rebuild the conflicted
environemnt suppress default reporting (but are otherwise thin wrapeprs
around the base equivalents).

conflicted also provides a `conflict_scout()` helper which you can use
to see what?s going on:

    conflict_scout(c("dplyr", "MASS"))
    #> 1 conflict:
    #> * `select`: dplyr, MASS

conflicted applies a few heuristics to minimise false positives (at the
cost of introducing a few false negatives). The overarching goal is to
ensure that code behaves identically regardless of the order in which
packages are attached.

-   A number of packages provide a function that appears to conflict
    with a function in a base package, but they follow the superset
    principle (i.e. they only extend the API, as explained to me by
    Herv? Pages).

    conflicted assumes that packages adhere to the superset principle,
    which appears to be true in most of the cases that I?ve seen. For
    example, the lubridate package provides `as.difftime()` and `date()`
    which extend the behaviour of base functions, and provides S4
    generics for the set operators.

        conflict_scout(c("lubridate", "base"))
        #> 5 conflicts:
        #> * `as.difftime`: [lubridate]
        #> * `date`       : [lubridate]
        #> * `intersect`  : [lubridate]
        #> * `setdiff`    : [lubridate]
        #> * `union`      : [lubridate]

    There are two popular functions that don?t adhere to this principle:
    `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
    special cases so they correctly generate conflicts. (I sure wish I?d
    know about the subset principle when creating dplyr!)

        conflict_scout(c("dplyr", "stats"))
        #> 2 conflicts:
        #> * `filter`: dplyr, stats
        #> * `lag`   : dplyr, stats

-   Deprecated functions should never win a conflict, so conflicted
    checks for use of `.Deprecated()`. This rule is very useful when
    moving functions from one package to another. For example, many
    devtools functions were moved to usethis, and conflicted ensures
    that you always get the non-deprecated version, regardess of package
    attach order:

        head(conflict_scout(c("devtools", "usethis")))
        #> 26 conflicts:
        #> * `use_appveyor`       : [usethis]
        #> * `use_build_ignore`   : [usethis]
        #> * `use_code_of_conduct`: [usethis]
        #> * `use_coverage`       : [usethis]
        #> * `use_cran_badge`     : [usethis]
        #> * `use_cran_comments`  : [usethis]
        #> ...

Finally, as mentioned above, the user can declare preferences:

    conflict_prefer("select", "MASS")
    #> [conflicted] Will prefer MASS::select over any other package
    conflict_scout(c("dplyr", "MASS"))
    #> 1 conflict:
    #> * `select`: [MASS]

I?d love to hear what people think about the general idea, and if there
are any obviously missing pieces.

Thanks!

Hadley
#
First, some general comments:

This sounds like a useful package.

I would guess it has very little impact on runtime efficiency except 
when attaching a new package; have you checked that?

I am not so sure about your heuristics.  Can they be disabled, so the 
user is always forced to make the choice?  Even when a function is 
intended to adhere to the superset principle, they don't always get it 
right, so a really careful user should always do explicit disambiguation.

And of course, if users wrote most of their long scripts as packages 
instead of as long scripts, the ambiguity issue would arise far less 
often, because namespaces in packages are intended to solve the same 
problem as your package does.

One more comment inline about a typo, possibly in an error message.

Duncan Murdoch
On 23/08/2018 2:31 PM, Hadley Wickham wrote:
I don't know if this is a typo in your r-devel message or a typo in the 
error message, but you say `conflicted_prefer()` in one place and 
conflict_prefer() in the other.
#
If you have to load two packages which both export the same name in their namespaces, namespace does not help in resolving which synonymous function to use. Neither does it help to have a package instead of a script as long as you end up loading two namespaces with name conflicts. The order of importing namespaces can also be difficult to control, because you may end up loading a namespace already when you start your R with a saved workspace. Moving a function to another package may be a transitional issue which disappears when both packages are at their final stages, but if you use the recommend deprecation stage, the same names can live together for a long time. So this package is a good idea, and preferably base R should be able to handle the issue of choosing between exported synonymous functions.

This has bitten me several times in package development, and with growing CRAN it is a growing problem. Package authors often have poor control of the issue, as they do not know what packages users use. Now we can only have a FAQ that tells that a certain error message does not come from a function in our package, but from some other package having a synonymous function that was used instead.

cheers, Jari Oksanen
On 23 Aug 2018, at 23:46 pm, Duncan Murdoch <murdoch.duncan at gmail.com<mailto:murdoch.duncan at gmail.com>> wrote:
First, some general comments:

This sounds like a useful package.

I would guess it has very little impact on runtime efficiency except when attaching a new package; have you checked that?

I am not so sure about your heuristics.  Can they be disabled, so the user is always forced to make the choice?  Even when a function is intended to adhere to the superset principle, they don't always get it right, so a really careful user should always do explicit disambiguation.

And of course, if users wrote most of their long scripts as packages instead of as long scripts, the ambiguity issue would arise far less often, because namespaces in packages are intended to solve the same problem as your package does.

One more comment inline about a typo, possibly in an error message.

Duncan Murdoch
On 23/08/2018 2:31 PM, Hadley Wickham wrote:
Hi all,
I?d love to get your feedback on the conflicted package, which provides an
alternative strategy for resolving ambiugous function names (i.e. when
multiple packages provide identically named functions). conflicted 0.1.0
is already on CRAN, but I?m currently preparing a revision
(<https://github.com/r-lib/conflicted>), and looking for feedback.
As you are no doubt aware, R?s default approach means that the most
recently loaded package ?wins? any conflicts. You do get a message about
conflicts on load, but I see a lot newer R users experiencing problems
caused by function conflicts. I think there are three primary reasons:
-   People don?t read messages about conflicts. Even if you are
    conscientious and do read the messages, it?s hard to notice a single
    new conflict caused by a package upgrade.
-   The warning and the problem may be quite far apart. If you load all
    your packages at the top of the script, it may potentially be 100s
    of lines before you encounter a conflict.
-   The error messages caused by conflicts are cryptic because you end
    up calling a function with utterly unexpected arguments.
For these reasons, conflicted takes an alternative approach, forcing the
user to explicitly disambiguate any conflicts:
    library(conflicted)
    library(dplyr)
    library(MASS)
    select
    #> Error: [conflicted] `select` found in 2 packages.
    #> Either pick the one you want with `::`
    #> * MASS::select
    #> * dplyr::select
    #> Or declare a preference with `conflicted_prefer()`
    #> * conflict_prefer("select", "MASS")
    #> * conflict_prefer("select", "dplyr")

I don't know if this is a typo in your r-devel message or a typo in the error message, but you say `conflicted_prefer()` in one place and conflict_prefer() in the other.

conflicted works by attaching a new ?conflicted? environment just after
the global environment. This environment contains an active binding for
any ambiguous bindings. The conflicted environment also contains
bindings for `library()` and `require()` that rebuild the conflicted
environemnt suppress default reporting (but are otherwise thin wrapeprs
around the base equivalents).
conflicted also provides a `conflict_scout()` helper which you can use
to see what?s going on:
    conflict_scout(c("dplyr", "MASS"))
    #> 1 conflict:
    #> * `select`: dplyr, MASS
conflicted applies a few heuristics to minimise false positives (at the
cost of introducing a few false negatives). The overarching goal is to
ensure that code behaves identically regardless of the order in which
packages are attached.
-   A number of packages provide a function that appears to conflict
    with a function in a base package, but they follow the superset
    principle (i.e. they only extend the API, as explained to me by
    Herv? Pages).
    conflicted assumes that packages adhere to the superset principle,
    which appears to be true in most of the cases that I?ve seen. For
    example, the lubridate package provides `as.difftime()` and `date()`
    which extend the behaviour of base functions, and provides S4
    generics for the set operators.
        conflict_scout(c("lubridate", "base"))
        #> 5 conflicts:
        #> * `as.difftime`: [lubridate]
        #> * `date`       : [lubridate]
        #> * `intersect`  : [lubridate]
        #> * `setdiff`    : [lubridate]
        #> * `union`      : [lubridate]
    There are two popular functions that don?t adhere to this principle:
    `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
    special cases so they correctly generate conflicts. (I sure wish I?d
    know about the subset principle when creating dplyr!)
        conflict_scout(c("dplyr", "stats"))
        #> 2 conflicts:
        #> * `filter`: dplyr, stats
        #> * `lag`   : dplyr, stats
-   Deprecated functions should never win a conflict, so conflicted
    checks for use of `.Deprecated()`. This rule is very useful when
    moving functions from one package to another. For example, many
    devtools functions were moved to usethis, and conflicted ensures
    that you always get the non-deprecated version, regardess of package
    attach order:
        head(conflict_scout(c("devtools", "usethis")))
        #> 26 conflicts:
        #> * `use_appveyor`       : [usethis]
        #> * `use_build_ignore`   : [usethis]
        #> * `use_code_of_conduct`: [usethis]
        #> * `use_coverage`       : [usethis]
        #> * `use_cran_badge`     : [usethis]
        #> * `use_cran_comments`  : [usethis]
        #> ...
Finally, as mentioned above, the user can declare preferences:
    conflict_prefer("select", "MASS")
    #> [conflicted] Will prefer MASS::select over any other package
    conflict_scout(c("dplyr", "MASS"))
    #> 1 conflict:
    #> * `select`: [MASS]
I?d love to hear what people think about the general idea, and if there
are any obviously missing pieces.
Thanks!
Hadley


______________________________________________
R-devel at r-project.org<mailto:R-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
#
Dear Hadley,

There's been some mails from you lately about packages on R-devel. I would
argue that the appropriate list for that is R-pkg-devel, as I've been told
myself not too long ago. People might get confused and think this is about
a change to R itself, which it obviously is not.

Kind regards
Joris
On Thu, Aug 23, 2018 at 8:32 PM Hadley Wickham <h.wickham at gmail.com> wrote:

            

  
    
#
On Thu, Aug 23, 2018 at 3:46 PM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
It adds one extra element to the search path, so the impact on speed
should be equivalent to loading one additional package (i.e.
negligible)

I've also done some benchmarking to see the impact on calls to
library(). These are now a little outdated (because I've added more
heuristics so I should re-do), but previously conflicted added about
100 ms overhead to a library() call when I had ~170 packages loaded
(the most I could load without running out of dlls).
That is a good question - my intuition is always to start with less
user control as it makes it easier to get the core ideas right, and
it's easy to add more control later (whereas if you later take it
away, people get unhappy). Maybe it's natural to have a function that
does the opposite of conflict_prefer(), and declare that something
that doesn't appear to be a conflict actually is?

I don't think that an option to suppress the superset principle
altogether will work - my sense is that it will generate too many
false positives, to the point where you'll get frustrated and stop
using conflicted.
Agreed.
Thanks for spotting; fixed in devel now.

Hadley
#
On Fri, Aug 24, 2018 at 4:28 AM Joris Meys <jorismeys at gmail.com> wrote:
The description for R-pkg-devel states:
The description for R-devel states:
My questions are not about how to develop a package, R CMD check, or
how to get it on CRAN, but instead about the semantics of the packages
I am working on. My opinion is supported by the fact that a number of
members of the R core team have responded (both on list and off) and
have not expressed concern about my choice of venue.

That said, I am happy to change venues (or simply not email at all) if
there is widespread concern that my emails are inappropriate.

Hadley
#
On Fri, Aug 24, 2018 at 2:27 PM Hadley Wickham <h.wickham at gmail.com> wrote:

            
If those moderating the lists are fine with it, all good.

Cheers
Joris

  
    
#
On 24/08/2018 3:12 AM, Jari Oksanen wrote:
You can't import the same name from two packages without getting an 
error message (at least when checking --as-cran, I'm not sure about 
vanilla checks), so this is already handled.

If you really only want one of the imports, then importing individual 
functions is the solution.  Don't import everything from the package. 
This is a good idea in any case.

If you want both of the imports, then there's the undocumented (?) 
ability to rename a function on import, as well as the documented 
possibility of using :: for one of them instead of importing it.
That doesn't make sense in the context of a package.  Packages import 
what they ask to import. The user's workspace is irrelevant to code 
within the package if it does its imports properly.  You can reference 
functions that are not imported, but you get a message when you run 
checks to tell you not to do that.

Duncan Murdoch

  Moving a function to another package may be a
#
Hadley,

Overall seems like a cool and potentially really idea. I do have some
thoughts/feedback, which I've put in-line below

On Thu, Aug 23, 2018 at 11:31 AM, Hadley Wickham <h.wickham at gmail.com>
wrote:
It seems that you may be able to strengthen this heuristic from a blanket
assumption to something more narrowly targeted by looking for one or more
of the following to confirm likely-superset adherence

   1. matching or purely extending formals (ie all the named arguments of
   base::fun match including order, and there are new arguments in pkg::fun
   only if base::fun takes ...)
   2. explicit call to  base::fun in the body of pkg::fun
   3. UseMethod(funname) and at least one provided S3 method calls base::fun
   4. S4 generic creation using fun or base::fun as the seeding/default
   method body or called from at least one method
I would completely believe this rule is useful for refactoring as you
describe, but that is the "same function" case. For an end-user in the
"different function same symbol" case it's not at all clear to me that the
deprecated function should always win.

People sometimes use deprecated functions. It's not great, and eventually
they'll need to fix that for any given case, but imagine if you deprecated
the filter verb in dplyr (I know this will never happen, but I think it's
illustrative none the less).

Consider a piece of code someone wrote before this hypothetical deprecation
of filter. The fact that it's now deprecated certainly doesn't mean that
they secretly wanted stats::filter all along, right? Conflicted acting as
if it does will lead to them getting the exact kind of error you're looking
to protect them from, and with even less ability to understand why because
they are already doing "The right thing" to protect themselves by using
conflicted in the first place...
I deeply worry about people putting this kind of thing, or even just
library(conflicted), in their .Rprofile and thus making their scripts
*substantially* less reproducible. Is that a consequence you have thought
about to this kind of functionality?

Best,
~G
Best,
~G
5 days later
#
Oooh nice, idea I'll definitely try it out.
Ah yes, good point. I'll add some heuristic to check that the function
name appears in the first argument of the .Deprecated call (assuming
that the call looks something like `.Deprecated("pkg::foo")`)
Yes, and I've already recommended against it in two places :)  I'm not
sure if there's any more I can do - people already put (e.g.)
`library(ggplot2)` in their .Rprofile, which is just as bad from a
reproducibility standpoint.

Thanks for the thoughtful feedback!

Hadley