Skip to content

R CMD check and missing imports from base packages

15 messages · Winston Chang, Gabriel Becker, Duncan Murdoch +6 more

#
When a symbol in a package is resolved, R looks into the package's
environment, and then into the package's imports environment. Then, if the
symbol is still not resolved, it looks into the base package. So far so
good.

If still not found, it follows the 'search()' path, starting with the
global environment and then all attached packages, finishing with base and
recommended packages.

This can be a problem if a package uses a function from a base package, but
it does not formally import it via the NAMESPACE file. If another package
on the search path also defines a function with the same name, then this
second function will be called.

E.g. if package 'ggplot2' uses 'stats::density()', and package 'igraph'
also defines 'density()', and 'igraph' is on the search path, then
'ggplot2' will call 'igraph::density()' instead of 'stats::density()'.

I think that for a better solution, either
1) the search path should not be used at all to resolve symbols in
packages, or
2) only base packages should be searched.

I realize that this is something that is not easy to change, especially 1)
would break a lot of packages. But maybe at least 'R CMD check' could
report these cases. Currently it reports missing imports for non-base
packages only. Is it reasonable to have a NOTE for missing imports from
base packages as well?

[As usual, please fix me if I am missing or misunderstood something.]

Thank you, Best,
Gabor
#
On Tue, Apr 28, 2015 at 3:04 PM, G?bor Cs?rdi <csardi.gabor at gmail.com> wrote:
Just to be clear: you mean that this happens when ggplot2 contains a
call like 'density()', not 'stats::density()' (but the intention is to
call stats::density()), right?

-Winston
#
On Wed, Apr 29, 2015 at 12:53 PM, Winston Chang <winstonchang1 at gmail.com>
wrote:
Yes, exactly as you say, I am sorry for the confusion. This is actually a
real example:
https://github.com/hadley/ggplot2/issues/1041

Gabor

  
  
#
Gabor,

To play devil's advocate a bit,  why not just have the package formally
import the functions it wants to use (or the whole package if that is
easier)? Also, if your package Depends on another package, instead of
importing it (e.g. because the end user will need functions in it in order
to meaningfully use your functions), I imagine you *want* to hit symbols in
that package before base, right? Otherwise the Depends mechanism becomes
somewhat crippled because you'd need to import symbols from packages you
Depend on, at least in certain cases.

You don't want to require people to import things like the assignment
operator, or "if", or a bunch of other things in the base package (and
probably not the stuff in grDevices either, though from your description
they would in principle need to do that now).

But why should stats not require an import if you want to guarantee that
you get the density function from stats and not from somewhere else? Isn't
that what ImportFrom is for? Is the reason that it is loaded automatically?

Best,
~G

On Wed, Apr 29, 2015 at 10:00 AM, G?bor Cs?rdi <csardi.gabor at gmail.com>
wrote:

  
    
#
On Wed, Apr 29, 2015 at 1:45 PM, Gabriel Becker <gmbecker at ucdavis.edu>
wrote:
This is exactly my goal. And to facilitate this, R CMD check could remind
you if you forgot to do the formal import.

You don't want to require people to import things like the assignment
I am not talking about the 'base' package, only the other base packages.
(The base package is special, it is searched before the attached packages,
so it is probably fine.)

Yes, I suggest people import stuff from grDevices explicitly. Because if
they don't, their package might break if used together with other packages.
And this is not a theoretical issue, in the past couple of weeks I have
seen this happening three times.

But why should stats not require an import if you want to guarantee that
That's exactly what I am saying, sorry if it was not clear. I want to
"require" format imports from base packages (except from _the_ base
package, maybe).

Yes, people can do this already. But why not help them with a NOTE if they
don't know that this is good practice, or they just simply forget?

Gabor

  
  
#
On 29/04/2015 1:57 PM, G?bor Cs?rdi wrote:
I suspect the reason for this is historical:  at the time that the 
current warning was added, it would have flagged too many packages as 
problematic.  People do complain when base R makes changes that force 
them to change their packages.  Perhaps that decision should be 
reconsidered now.

Duncan Murdoch
#
All,

Here are two ideas on this:

1. have R CMD check show how every external function reference gets 
resolved.
2. have R CMD check warn anytime there is a potential name conflict, e.g. 
density( ) coming from either igraph or stats, and show how it was 
resolved.

Either could be an option.  I guess there are potential problems with 
conditional loading of libraries where a function could be  resolved 
differently depending on some decision made at run time, but 1. or 2. 
might be useful for the standard case.

John 
 ..............................................................
 John P. Nolan
 Math/Stat Department
 227 Gray Hall,   American University
 4400 Massachusetts Avenue, NW
 Washington, DC 20016-8050

 jpnolan at american.edu       voice: 202.885.3140 
 web: academic2.american.edu/~jpnolan
 ..............................................................



From:   G?bor Cs?rdi <csardi.gabor at gmail.com>
To:     Gabriel Becker <gmbecker at ucdavis.edu>, 
Cc:     "r-devel at r-project.org" <r-devel at r-project.org>
Date:   04/29/2015 01:57 PM
Subject:        Re: [Rd] R CMD check and missing imports from base 
packages
Sent by:        "R-devel" <r-devel-bounces at r-project.org>



On Wed, Apr 29, 2015 at 1:45 PM, Gabriel Becker <gmbecker at ucdavis.edu>
wrote:
This is exactly my goal. And to facilitate this, R CMD check could remind
you if you forgot to do the formal import.

You don't want to require people to import things like the assignment
I am not talking about the 'base' package, only the other base packages.
(The base package is special, it is searched before the attached packages,
so it is probably fine.)

Yes, I suggest people import stuff from grDevices explicitly. Because if
they don't, their package might break if used together with other 
packages.
And this is not a theoretical issue, in the past couple of weeks I have
seen this happening three times.

But why should stats not require an import if you want to guarantee that
Isn't
automatically?
That's exactly what I am saying, sorry if it was not clear. I want to
"require" format imports from base packages (except from _the_ base
package, maybe).

Yes, people can do this already. But why not help them with a NOTE if they
don't know that this is good practice, or they just simply forget?

Gabor
<winstonchang1 at gmail.com>
<csardi.gabor at gmail.com>
'stats::density()'.
to
a
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
#
On Wed, Apr 29, 2015 at 3:28 PM, John Nolan <jpnolan at american.edu> wrote:
[...]
That's not possible, because it depends on the currently attached packages,
and even on the order of their attachment.
Also not possible in practice, because it would need to check too many
packages. But it would not be a good solution in my opinion, anyway,
because you probably don't want people having to deal with these issues. I
surely don't. For the developer of a single package, it should be
completely irrelevant how many other packages use the same function names.

Gabor

[...]
#
On Wed, Apr 29, 2015 at 3:24 PM, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
[...]
Thanks for considering this. I definitely think that with the number of
packages increasing, sooner or later this will be a (more) serious issue.
In some cases it is also hard(er) to work around with detaching and
re-attaching packages.

I think a NOTE would be probably enough, because probably a lot of packages
have these issues.

Also, people would not need to change their package _now_. Only if and when
they submit new versions. But that's CRAN's decision, not mine.

Gabor

[...]
#
On 04/28/2015 01:04 PM, G?bor Cs?rdi wrote:
stats::density() is an S3 generic, so igraph would define an S3 method, right? 
And in general a developer would avoid masking a function in a base package, so 
as not to require the user to distinguish between stats::density() and 
igraph::density(). Maybe the example is not meant literally.

Being able to easily flag non-imported, non-base symbols would definitely 
improve the robustness of package code, even if not helping the end user 
disambiguate duplicate symbols.

Martin Morgan

  
    
#
On Wed, Apr 29, 2015 at 4:24 PM, Martin Morgan <mtmorgan at fredhutch.org>
wrote:
[...]
Yes, maybe this is not the best example. I would not go into the details
here, because they are not really important for the current discussion.
In 'non-base' I assume you mean the single base package.

Just to address the second part, I think the end-user does not have to deal
with this. (Unless it is buggy like it is now.) The fact that ggplot2 calls
density() is not the user's concern.

Gabor

[...]
#
The 'filter' function in the popular 'dplyr' package masks the one
that has been in the stats package forever, and they have nothing
in common, so that may give you an example.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Apr 29, 2015 at 1:24 PM, Martin Morgan <mtmorgan at fredhutch.org>
wrote:

  
  
#
On 04/29/2015 05:38 PM, William Dunlap wrote:
As I recall, several packages mask the simulate generic in stats, if you 
are looking for examples.

Paul Gilbert
#
On Wed, Apr 29, 2015 at 8:12 PM, Paul Gilbert <pgilbert902 at gmail.com> wrote:
FWIW, here is a list of base* functions masked** by CRAN packages:
https://github.com/gaborcsardi/rfunctions/blob/master/rfunctions.md
Look at the long table in the end. simulate indeed comes up 16 times. (But
read ** below.)

* This only includes functions in base packages that are attached by
default.
** The CRAN function names were taken from the manual page aliases, so a
couple of them are not really function names, and some of them are probably
not masking anything, but defining an S3 generic. This was a lot simpler
than going over the namespaces of all CRAN packages.

It seems that several base functions are masked, by several packages. I
would need to get a better CRAN function list, though.

Gabor

[...]

  
  
#

        
> On Wed, Apr 29, 2015 at 8:12 PM, Paul Gilbert <pgilbert902 at gmail.com> wrote:
>> 
    >> As I recall, several packages mask the simulate generic in stats, if you
    >> are looking for examples.
    >> 

    > FWIW, here is a list of base* functions masked** by CRAN packages:
    > https://github.com/gaborcsardi/rfunctions/blob/master/rfunctions.md
    > Look at the long table in the end. simulate indeed comes up 16 times. (But
    > read ** below.)

    > * This only includes functions in base packages that are attached by
    > default.
    > ** The CRAN function names were taken from the manual page aliases, so a
    > couple of them are not really function names, and some of them are probably
    > not masking anything, but defining an S3 generic. This was a lot simpler
    > than going over the namespaces of all CRAN packages.

    > It seems that several base functions are masked, by several packages. I
    > would need to get a better CRAN function list, though.

Thank you Gabor.

As Martin Morgan observed, many of these will be methods, S3 and
S3 and there is often no conflict and no masking at all for all
these, so I am claiming that your "statistics" are quite a bit
biased, and probably most of the more common names are proper
methods.

There is another issue, I've wanted to raise,
exemplified by my Rmpfr package [interface to the MPFR C library
for high-accuracy floating point computations]:

As it defines "mpfr"-ified versions of standard R functions 
many of which are *not* generic, it must mask them.

E.g, currently, the following "are masked" messages (when
attaching Rmpfr) are unavoidable

  Attaching package: ?Rmpfr?

  The following objects are masked from ?package:stats?:

      dbinom, dnorm, dpois, pnorm

  The following objects are masked from ?package:base?:

      pmax, pmin

(Currently, there are also 'cbind' and 'rbind' being masked in
 order to work with  mpfr-number matrices ... but that should in
 principle be avoidable)

I think these "... are masked" messages are still appropriate:  After all,
pnorm() will really be taken from Rmpfr instead of from stats,
and the user should know that she is trusting the author of
Rmpfr (me) to do the right thing, when she calls pnorm(), namely
to call the equivalent stats::pnorm() for standard numbers, and
to call the MPFR-library code for mpfr-numbers.

- - - - - - - - - - - - - - - - - - - - - - - - 

Now the above may have almost nothing to do with the original
subject.  If I have understood your main point correctly, you are
suggesting that  'R CMD check' should start putting out a NOTE
when package code calls a function from one of a set of
"standard packages" (*) and the package author failed to use an  
	importFrom(<standard pkg>, <function>)
in his/her NAMESPACE.

I agree that this would be useful. 
Actually, I think we have something like this in place already...
but maybe not strictly enough (?)

(*) IIUC, you suggested to use 
   "standard packages" := packages which are attached by default
   in R, apart from package 'base' because that does come
   immediately after the imports anyway (and because you cannot
   explicity import from base).


Martin Maechler
ETH Zurich