Skip to content

[R-pkg-devel] Handling Not-Always-Needed Dependencies?

34 messages · Thomas J. Leeper, Joshua Ulrich, Kevin Ushey +5 more

Messages 1–25 of 34

#
I have a fairly open-ended question about the handling of package
dependencies that is inspired by the precise distinction between
"Depends", "Imports", and "Suggests" listed in DESCRIPTION.

Some background, as I understand it: Prior to requiring package
namespaces, we listed package dependencies in "Depends" because the
required package had to be (1) installed and (2) on the search path in
order to access its functions. After introducing and requiring
namespaces, there are very few circumstances in which "Depends" is
needed because functions from the required package can either be
imported (via "Imports" and NAMESPACE) and/or referred to using the
`pkg::fun()` style. "Suggests" packages are maybe needed for examples,
vignettes, etc. but are not installed by default via
`install.packages()`, so are not guaranteed to be available.

Some observations:

1. "Depends" seems to be less useful than before, except in rare cases
where a package needs to be (a) installed, (b) loaded, and (c) on the
search path. Imports covers most package dependency use cases.

2. There is a gap in functionality between "Imports" and "Suggests".
Sometimes there is a need for functions that should be available
(i.e., installed) but do not need to be loaded or imported because
they are rarely used (e.g., graphing functions). Importing the
functions adds bloat but only putting their package in "Suggests" does
not guarantee availability when, for example, calling
`requireNamespace()` or `require()`.

Suggestion:

Might it be useful to have a category between "Imports" and "Suggests"
that covers packages that should be installed but not imported?

This could be done by changing `install.packages()` to cover
"Suggests" by default, changing the meaning of "Depends" and
"Imports", or adding a new category to DESCRIPTION.

I am interested in hearing your thoughts on this issue.

Best,
-Thomas

Thomas J. Leeper
http://www.thomasleeper.com
#
See eg https://www.debian.org/doc/debian-policy/ch-relationships.html

We have other big fish to fry.  Eg many packages have LinkingTo: BH (as Boost
headers are indeed awesome) but this 120 mb (!!) payload is needed _once_
during package compilation / installation and never again afterwards.  So
Build-Depends: would be a nice addition too.

Dirk
#
On Tue, Aug 2, 2016 at 11:26 AM, Thomas J. Leeper <thosjleeper at gmail.com> wrote:
Maybe I'm missing something, but isn't that the point of calling
requireNamespace()?  For example:
if (requireNamespace("suggestedPackage"))
  stop("suggestedPackage required but not installed")

That doesn't seem like a heavy burden for package writers when writing
infrequently used functions in their package.  You could even add the
install.packages command to the error message to instruct users to
install the required package.
I personally would not want install.packages() to install packages I'm
unlikely to actually use.

It's also not clear to me how importing rarely used functions causes
bloat, but installing all packages for all rarely-used functions does
not cause bloat.

  
    
#
Dirk,

That is useful. I think the "Recommends" category from that link is
pretty much what I am suggesting - something that should be installed
if it's available in the package repository but is not loaded/imported
by default.

Some of those other relationships sound like they would also be
useful, for example in the case you mention.

-Thomas

Thomas J. Leeper
http://www.thomasleeper.com
On Tue, Aug 2, 2016 at 5:34 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
#
On 2 August 2016 at 11:36, Joshua Ulrich wrote:
| Maybe I'm missing something, but isn't that the point of calling
| requireNamespace()?  For example:
| if (requireNamespace("suggestedPackage"))
|   stop("suggestedPackage required but not installed")
| 
| That doesn't seem like a heavy burden for package writers when writing
| infrequently used functions in their package.  You could even add the
| install.packages command to the error message to instruct users to
| install the required package.

[...]

| I personally would not want install.packages() to install packages I'm
| unlikely to actually use.
| 
| It's also not clear to me how importing rarely used functions causes
| bloat, but installing all packages for all rarely-used functions does
| not cause bloat.

Sadly, some people whose advocacy is taken as religous gospel in some
circles, particularly beginners, advocate pretty much that: treat Suggests:
as Depends: and install it anyway because, hell, why would one tests.

I regularly run (large) reverse depends checks against some of my more widely
used packages and run into this all the time.

We (as a community, including CRAN whose gatekeepers I have failed to
convince about this on on multiple attempts) are doing this wrong.
"Suggests:" really means optionally, rather than unconditionally.  But it
would appear that you and I are about to only ones desiring that behaviour.

Dirk
#
On 02/08/2016 1:01 PM, Dirk Eddelbuettel wrote:
I thought I understood Joshua's point (and agreed with it), but you also 
seem to be agreeing with him and I don't understand at all what you're 
saying.

What is "this" in your last paragraph, that you have failed to convince 
CRAN gatekeepers about?

Duncan Murdoch
#
On 2 August 2016 at 13:12, Duncan Murdoch wrote:
| On 02/08/2016 1:01 PM, Dirk Eddelbuettel wrote:
| >
| > On 2 August 2016 at 11:36, Joshua Ulrich wrote:
| > | Maybe I'm missing something, but isn't that the point of calling
| > | requireNamespace()?  For example:
| > | if (requireNamespace("suggestedPackage"))
| > |   stop("suggestedPackage required but not installed")
| > |
| > | That doesn't seem like a heavy burden for package writers when writing
| > | infrequently used functions in their package.  You could even add the
| > | install.packages command to the error message to instruct users to
| > | install the required package.
| >
| > [...]
| >
| > | I personally would not want install.packages() to install packages I'm
| > | unlikely to actually use.
| > |
| > | It's also not clear to me how importing rarely used functions causes
| > | bloat, but installing all packages for all rarely-used functions does
| > | not cause bloat.
| >
| > Sadly, some people whose advocacy is taken as religous gospel in some
| > circles, particularly beginners, advocate pretty much that: treat Suggests:
| > as Depends: and install it anyway because, hell, why would one tests.
| >
| > I regularly run (large) reverse depends checks against some of my more widely
| > used packages and run into this all the time.
| >
| > We (as a community, including CRAN whose gatekeepers I have failed to
| > convince about this on on multiple attempts) are doing this wrong.
| > "Suggests:" really means optionally, rather than unconditionally.  But it
| > would appear that you and I are about to only ones desiring that behaviour.
| 
| I thought I understood Joshua's point (and agreed with it), but you also 
| seem to be agreeing with him and I don't understand at all what you're 
| saying.
| 
| What is "this" in your last paragraph, that you have failed to convince 
| CRAN gatekeepers about?

It is really simple:  Having _both_ Suggests: foo _and_ an unconditonal call
to foo::bar() in the code.

Whereas Josh and I argue that it needs to be conditional on requireNames()
coming back TRUE.

Dirk
#
You could abuse the LinkingTo: field of the DESCRIPTION file, to
specify 'packages you want installed alongside your package that you
won't necessarily use at runtime'. (It's possible that 'R CMD check'
might warn about strange usages of LinkingTo:, but I'm not sure)

You could also just place these packages in Suggests, but guard their
usages with a function that attempts to download, install and load
those packages. E.g. (pseudo-ish code)

    ensure_package_loaded <- function(package) {
      if (requireNamespace(package))
        return(TRUE)
      if (interactive())
       # prompt user to download + install package, then try to load again
      stop("routine requires package but is not installed")
    }

and use it as:

    uses_foo <- function() {
      ensure_package_loaded("foo")
      foo::bar()
    }

This would at least minimize the amount of time spent 'annoying' the
user re: installation of package dependencies (it would only happen
the first time they try to use those functions)

Overall, I think the best resolution here is through clever use of
Suggests + on-demand installation of required packages, rather than
adding another field to the DESCRIPTION file.

Cheers,
Kevin
On Tue, Aug 2, 2016 at 9:26 AM, Thomas J. Leeper <thosjleeper at gmail.com> wrote:
#
On 2 August 2016 at 11:00, Kevin Ushey wrote:
| You could abuse the LinkingTo: field of the DESCRIPTION file, to
| specify 'packages you want installed alongside your package that you
| won't necessarily use at runtime'. (It's possible that 'R CMD check'
| might warn about strange usages of LinkingTo:, but I'm not sure)

I'd rather not abuse LinkingTo: any further. I fear its feelings are
sufficiently hurt being mostly used for 'IncludeFrom:' and no linking.

Kidding aside I think we have room for new tags in DESCRIPTION. We may as
well do it right.
 
| You could also just place these packages in Suggests, but guard their
| usages with a function that attempts to download, install and load
| those packages. E.g. (pseudo-ish code)
| 
|     ensure_package_loaded <- function(package) {
|       if (requireNamespace(package))
|         return(TRUE)

Nice.  [ And erratum in my previus email where I meant requireNamespace() ]

We could even go one further and store this in a per-package environment,
being filled through .onLoad() or .onAttach().

|       if (interactive())
|        # prompt user to download + install package, then try to load again
|       stop("routine requires package but is not installed")
|     }

I don't see why this should work only when interactive(). I think it should
work all the time.

I also don't think it needs to stop. It is after all just some auxiliary
functionality that Suggests provides.  Eg extra regression tests, say with or
without a certain backend -- and those tests then need the yes/no toggle and
skip if unavailable.
 
| and use it as:
| 
|     uses_foo <- function() {
|       ensure_package_loaded("foo")
|       foo::bar()
|     }

Sure.
 
| This would at least minimize the amount of time spent 'annoying' the
| user re: installation of package dependencies (it would only happen
| the first time they try to use those functions)

I don't really want other packages going off installing autonomously.
We pretty much make this explicit in DESCRIPTION so that the user can make an
informed choice.
 
| Overall, I think the best resolution here is through clever use of
| Suggests + on-demand installation of required packages, rather than
| adding another field to the DESCRIPTION file.

Wholly agreed on some part, less so on the remainders.

Dirk
#
My concerns with installing on the fly are that it (1) requires internet
access during run time, and either (2) involves assumptions about
repository and library locations, or (3) requires exposing all arguments to
install.packages() via higher level functions, or (4) needs interactivity.

-Thomas
On Aug 2, 2016 8:39 PM, "Dirk Eddelbuettel" <edd at debian.org> wrote:

            

  
  
#
On 02/08/2016 1:41 PM, Dirk Eddelbuettel wrote:
I am feeling particularly dense today.  What about "Having _both_ 
Suggests: foo _and_ an unconditonal call to foo::bar() in the code." did 
you fail to convince CRAN about?  That it is a good thing, or a bad thing?

Duncan Murdoch
#
On 2 August 2016 at 16:07, Duncan Murdoch wrote:
| On 02/08/2016 1:41 PM, Dirk Eddelbuettel wrote:
| >
| > On 2 August 2016 at 13:12, Duncan Murdoch wrote:
| > | On 02/08/2016 1:01 PM, Dirk Eddelbuettel wrote:
| > | >
| > | > On 2 August 2016 at 11:36, Joshua Ulrich wrote:
| > | > | Maybe I'm missing something, but isn't that the point of calling
| > | > | requireNamespace()?  For example:
| > | > | if (requireNamespace("suggestedPackage"))
| > | > |   stop("suggestedPackage required but not installed")
| > | > |
| > | > | That doesn't seem like a heavy burden for package writers when writing
| > | > | infrequently used functions in their package.  You could even add the
| > | > | install.packages command to the error message to instruct users to
| > | > | install the required package.
| > | >
| > | > [...]
| > | >
| > | > | I personally would not want install.packages() to install packages I'm
| > | > | unlikely to actually use.
| > | > |
| > | > | It's also not clear to me how importing rarely used functions causes
| > | > | bloat, but installing all packages for all rarely-used functions does
| > | > | not cause bloat.
| > | >
| > | > Sadly, some people whose advocacy is taken as religous gospel in some
| > | > circles, particularly beginners, advocate pretty much that: treat Suggests:
| > | > as Depends: and install it anyway because, hell, why would one tests.
| > | >
| > | > I regularly run (large) reverse depends checks against some of my more widely
| > | > used packages and run into this all the time.
| > | >
| > | > We (as a community, including CRAN whose gatekeepers I have failed to
| > | > convince about this on on multiple attempts) are doing this wrong.
| > | > "Suggests:" really means optionally, rather than unconditionally.  But it
| > | > would appear that you and I are about to only ones desiring that behaviour.
| > |
| > | I thought I understood Joshua's point (and agreed with it), but you also
| > | seem to be agreeing with him and I don't understand at all what you're
| > | saying.
| > |
| > | What is "this" in your last paragraph, that you have failed to convince
| > | CRAN gatekeepers about?
| >
| > It is really simple:  Having _both_ Suggests: foo _and_ an unconditonal call
| > to foo::bar() in the code.
| >
| > Whereas Josh and I argue that it needs to be conditional on requireNames()
| > coming back TRUE.
| 
| I am feeling particularly dense today.  What about "Having _both_ 
| Suggests: foo _and_ an unconditonal call to foo::bar() in the code." did 
| you fail to convince CRAN about?  That it is a good thing, or a bad thing?

It is a Really Bad Thing )TM) because I read Writing R Extensions as meaning
that these ought to be tested for presence (as Suggests != Depends, so we
cannot [rather: should not] assume them) yet package authors are sloppy (and
CRAN let's them, being charged as facilitator to a crime here) and BAM I get
a failing test as some bad, bad code unconditionally calls code that should
only be called conditional on a suggested package actuallt being there.

Hth,  Dirk
#
The key question is: should R CMD check work if you the suggested
packages are installed? This generally affects tests, examples, and
vignettes.

If the answer is yes, R CMD check should work, then it would be
helpful for CRAN to enforce this, so that those of us who regularly
run revdep checks on large numbers of packages don't have to install
suggested packages.

Personally, I just install all the suggested dependencies of reverse
dependencies - it's not a huge burden for me since I can install
binaries from CRAN.

Hadley
#
On 02/08/2016 4:20 PM, Dirk Eddelbuettel wrote:
Okay, now I think I understand, but I agree with CRAN.  It is not 
feasible to tell if the test happened somewhere in the code unless we 
enforce a particular way of writing the test.

I would object if I had to write if (requireNamespace("foo")) multiple 
times just to satisfy CRAN's test, when any sane human could tell that 
the first test was sufficient.

For example, if my package Suggests: foo, I should be able to write

if (!requireNamespace("foo"))
   stop("Package foo is needed for this example")

and then merrily call foo::bar() as many times as I like.

Or am I still misunderstanding you?  What particular thing should CRAN 
change?

Duncan Murdoch
#
On 2 August 2016 at 18:13, Duncan Murdoch wrote:
| Okay, now I think I understand, but I agree with CRAN.  It is not 
| feasible to tell if the test happened somewhere in the code unless we 
| enforce a particular way of writing the test.

Debian has well over 20k packages, and they are tested this way.  You just
need to show the will of testing in an _empty_ environment to ensure
_everything_ that is needed is loaded.
 
| I would object if I had to write if (requireNamespace("foo")) multiple 
| times just to satisfy CRAN's test, when any sane human could tell that 
| the first test was sufficient.
| 
| For example, if my package Suggests: foo, I should be able to write
| 
| if (!requireNamespace("foo"))
|    stop("Package foo is needed for this example")
| 
| and then merrily call foo::bar() as many times as I like.
| 
| Or am I still misunderstanding you?  What particular thing should CRAN 
| change?

You seem to misunderstand that both you and I want 

  if (!requireNamespace("foo"))
     stop("Package foo is needed for this example")

(or alternative per-call tests) and that CRAN does not enforce either.

CRAN, like Hadley, just closes its eyes, swallows hard, and then simply loads
everything treating Suggests as if it were Depends.

But it ain't. Suggests != Depends.

Now clearer?

Dirk, now bored of this and it is clearly not going anywhere soon
#
On 02/08/2016 6:34 PM, Dirk Eddelbuettel wrote:
So really what you're suggesting is that CRAN should run tests with the 
suggested packages absent.  Presumably tests should also be run with 
them present.

But if they did that, the code that I want to write would call stop() 
and fail.  So we'd need some way to say "Let the user know they need 
'foo' to run this, but don't fail."  And we'd need to phase this in 
really gradually, because tons of packages are using code like my example.

You volunteered to help CRAN package checking.  Why not put together 
code to implement your idea, and see how big the problem would be to 
phase it in, by seeing how many packages fail under it?

Duncan Murdoch
#
After reading the link in Dirk's initial reply, how about adding
fields 'Recommends'
and 'Build-Depends' to DESCRIPTION as in Debian?

Recommends: only gets installed, can be used via if(requireNamespace())
from the package and in pkg tests[1]. [Debian: The Recommends field should
list packages that would be found together with this one in all but unusual
installations.]
Build-Depends: gets installed before build, removed after.
Suggests: only installed when requested at install.packages() and only used
in examples/vignettes.

If 'tons of packages' are using if(requireNamespace) in their package code
there seems to be a need for something like this. Compliance to the above
can be checked automatically and  a gradual implementation via
NOTE->WARNING->ERROR in R CMD check seems possible.

Perhaps more controversially a 'Breaks' field could be considered. There
are a few packages out there that have many, many, dependencies.
Implementing breaking updates currently depends on the willingness of many
authors to update their package or convincing the CRAN maintainers to allow
for (temporary) breakage.

The suggestion to have functions auto-install things is very inconvenient
for the good reasons pointed out by Thomas. Additionally, it is often based
on the wrong assumptions. Example: the RGtk2 package has this habit of
trying to install when libgtk2 is not on the path. But in my case that is
often exactly the case: it is just not on the path (libgtk2 is on the
network, the VM just doesn't know yet). So I'd rather have a proper and
accurate error message (which is good practice anyway).


Best,
Mark

[1] actually, once we know a pkg is Recommended, the 'if(requireNamespace)'
could even be absorbed in the :: operator.




Op wo 3 aug. 2016 om 01:46 schreef Duncan Murdoch <murdoch.duncan at gmail.com

  
  
#
On 03/08/2016 5:32 AM, Mark van der Loo wrote:
I think the distinction between Recommends and Suggests is too subtle 
here.  I already think it's a bad thing that we are using these words in 
ways that don't really correspond to English usage.  I'd much rather 
have a way of declaring explicitly the different aspects of dependence 
on a package rather than bundling them up into cute labels, but it's too 
late for that now.  However, we don't need to make things worse.
I don't follow the argument here.  What problem are you solving?
This isn't controversial, it's just a bad idea.  Don't encourage people 
to break things.
I don't see how :: would be any different than it is now.  If you don't 
have foo available, and you try to use foo::bar(), what would happen 
other than an error?

Duncan Murdoch
#
On 2 August 2016 at 19:45, Duncan Murdoch wrote:
| Why not put together code to implement your idea, and see how big the
| problem would be to phase it in, by seeing how many packages fail under it?

Ahh, the old 'put-up-or-shut-up' gambit, very nice. Big fan of that myself.

I have been sitting on the need to build better infrastructure for my reverse
depends checks anyway, and may get there over the fall.  Or not as I have
been saying that for a long time...  It will likely be Docker-based which may
or may not be suitable for CRAN, but should be for r-hub which may get this
all by itself too.  Not sure of this can be done in R alone.

Then again, users of TravisCI know that just toggling

_R_CHECK_FORCE_SUGGESTS_=FALSE

does it too so in that sense it is already there, but not used?

Dirk
#
[snip]
Agreed
worse.

Disagreed. We could follow the well-established practices of Debian (and
CRAN already does that, partially).
Basically I'm trying to address the idea suggested by Thomas, who started
this conversation, and make it a bit more explicit. I felt that the
discussion went a little off-track there.

Right now, when package code (not examples) uses a suggested package, part
of that package will by default not work - at least that's how people use
it now. I would like it to work by default. For examples/vignettes you
could be more forgiving since running an example is not core functionality
of a package.
Your reaction just proved my point about it being controversial. More
seriously, real progress is hardly ever possible without breaking things,
so I think at least people could have a serious discussion about it before
dismissing it simply as a bad idea. The Debian community obviously once
thought it was a good idea, so why not discuss it for R/CRAN? (discussions
are also an important way to progress even if no line of code is changed).
At the moment, I'm inclined against the idea, but I for one like to see me
proven wrong.
I think you're right there. <resets brain>.

Best,
Mark


Op wo 3 aug. 2016 om 13:41 schreef Duncan Murdoch <murdoch.duncan at gmail.com

  
  
#
The issue seems to boil down to the fact that Suggests is covering two
very different use cases: (a) conditional package code and (b)
example/test/vignette code.

Consider a package (say "foo") that is only used in tests. Users do
not need "foo" since package code never calls it. If our package
instead calls "foo" conditionally (requireNamespace(); foo::bar(),
etc.), then users may very well want "foo" and need it much more than
they would if "foo" were only used in tests. Yet DESCRIPTION does not
allow a distinction two be made between these two scenarios.

I think the length and complexity of the discussion in WRE[1] makes
clear that Suggests is being used two cover these two very different
use cases.

-Thomas

[1] https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Package-Dependencies

Thomas J. Leeper
http://www.thomasleeper.com


On Wed, Aug 3, 2016 at 2:04 PM, Mark van der Loo
<mark.vanderloo at gmail.com> wrote:
#
On 03.08.2016 15:04, Mark van der Loo wrote:
I don't see a point for the discussion, because there is even a CRAN 
policy that explain the procedure if there is really need (which should 
be carefully decided) to break things.


Best,
Uwe Ligges
#
On 03.08.2016 15:20, Thomas J. Leeper wrote:
I am more worried about yet another dependency level that confuses more 
than it helps for the majority of developers and users.
As an example, I nneed to explain myself again and again what "Enhances" 
means (and perhaps I am still doing it wrong).

Best,
Uwe Ligges
#
On 03.08.2016 14:24, Dirk Eddelbuettel wrote:
I was travelling, hence a delayed response:

Why users of TravisCI? It is documented in the manual. Setting it to 
true gives an error in the check if the suggested package is not 
available for the check. And typically you want to have it available to 
be able to actually check all parts of your package.
So this is not what you are looking for.

What you suggest means that we would have to run the checks twice, as 
Duncan explained already, once with and once without the suggetsed 
package installed.

Best,
Uwe
#
On 3 August 2016 at 16:00, Uwe Ligges wrote:
| On 03.08.2016 14:24, Dirk Eddelbuettel wrote:
| > Then again, users of TravisCI know that just toggling
| >
| > _R_CHECK_FORCE_SUGGESTS_=FALSE
| 
| I was travelling, hence a delayed response:
| 
| Why users of TravisCI? It is documented in the manual. Setting it to 

Because Travis breaks your check as it works in a cleanroom with only the
specified packages installed. As it (and that is the gist of my argument)
should ...

| true gives an error in the check if the suggested package is not 
| available for the check. And typically you want to have it available to 
| be able to actually check all parts of your package.
| So this is not what you are looking for.
| 
| What you suggest means that we would have to run the checks twice, as 
| Duncan explained already, once with and once without the suggetsed 
| package installed.

No, CRAN could just flip that toggle and run once.

What Duncan suggested was (as I understand it an empirically-driven
assessment of what it would take to get from here (set to FALSE, tolerate bad
Policy) to there (set to TRUE, IMHO better Policy adherence).

Dirk