When a symbol in a package is resolved, R looks into the package's environment, and then into the package's imports environment. Then, if the symbol is still not resolved, it looks into the base package. So far so good. If still not found, it follows the 'search()' path, starting with the global environment and then all attached packages, finishing with base and recommended packages. This can be a problem if a package uses a function from a base package, but it does not formally import it via the NAMESPACE file. If another package on the search path also defines a function with the same name, then this second function will be called. E.g. if package 'ggplot2' uses 'stats::density()', and package 'igraph' also defines 'density()', and 'igraph' is on the search path, then 'ggplot2' will call 'igraph::density()' instead of 'stats::density()'. I think that for a better solution, either 1) the search path should not be used at all to resolve symbols in packages, or 2) only base packages should be searched. I realize that this is something that is not easy to change, especially 1) would break a lot of packages. But maybe at least 'R CMD check' could report these cases. Currently it reports missing imports for non-base packages only. Is it reasonable to have a NOTE for missing imports from base packages as well? [As usual, please fix me if I am missing or misunderstood something.] Thank you, Best, Gabor
R CMD check and missing imports from base packages
15 messages · Winston Chang, Gabriel Becker, Duncan Murdoch +6 more
On Tue, Apr 28, 2015 at 3:04 PM, G?bor Cs?rdi <csardi.gabor at gmail.com> wrote:
E.g. if package 'ggplot2' uses 'stats::density()', and package 'igraph' also defines 'density()', and 'igraph' is on the search path, then 'ggplot2' will call 'igraph::density()' instead of 'stats::density()'.
Just to be clear: you mean that this happens when ggplot2 contains a call like 'density()', not 'stats::density()' (but the intention is to call stats::density()), right? -Winston
On Wed, Apr 29, 2015 at 12:53 PM, Winston Chang <winstonchang1 at gmail.com> wrote:
On Tue, Apr 28, 2015 at 3:04 PM, G?bor Cs?rdi <csardi.gabor at gmail.com> wrote:
E.g. if package 'ggplot2' uses 'stats::density()', and package 'igraph' also defines 'density()', and 'igraph' is on the search path, then 'ggplot2' will call 'igraph::density()' instead of 'stats::density()'.
Just to be clear: you mean that this happens when ggplot2 contains a call like 'density()', not 'stats::density()' (but the intention is to call stats::density()), right?
Yes, exactly as you say, I am sorry for the confusion. This is actually a real example: https://github.com/hadley/ggplot2/issues/1041 Gabor
-Winston
Gabor, To play devil's advocate a bit, why not just have the package formally import the functions it wants to use (or the whole package if that is easier)? Also, if your package Depends on another package, instead of importing it (e.g. because the end user will need functions in it in order to meaningfully use your functions), I imagine you *want* to hit symbols in that package before base, right? Otherwise the Depends mechanism becomes somewhat crippled because you'd need to import symbols from packages you Depend on, at least in certain cases. You don't want to require people to import things like the assignment operator, or "if", or a bunch of other things in the base package (and probably not the stuff in grDevices either, though from your description they would in principle need to do that now). But why should stats not require an import if you want to guarantee that you get the density function from stats and not from somewhere else? Isn't that what ImportFrom is for? Is the reason that it is loaded automatically? Best, ~G On Wed, Apr 29, 2015 at 10:00 AM, G?bor Cs?rdi <csardi.gabor at gmail.com> wrote:
On Wed, Apr 29, 2015 at 12:53 PM, Winston Chang <winstonchang1 at gmail.com> wrote:
On Tue, Apr 28, 2015 at 3:04 PM, G?bor Cs?rdi <csardi.gabor at gmail.com> wrote:
E.g. if package 'ggplot2' uses 'stats::density()', and package 'igraph' also defines 'density()', and 'igraph' is on the search path, then 'ggplot2' will call 'igraph::density()' instead of 'stats::density()'.
Just to be clear: you mean that this happens when ggplot2 contains a call like 'density()', not 'stats::density()' (but the intention is to call stats::density()), right?
Yes, exactly as you say, I am sorry for the confusion. This is actually a real example: https://github.com/hadley/ggplot2/issues/1041 Gabor
-Winston
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Gabriel Becker, PhD Computational Biologist Bioinformatics and Computational Biology Genentech, Inc. [[alternative HTML version deleted]]
On Wed, Apr 29, 2015 at 1:45 PM, Gabriel Becker <gmbecker at ucdavis.edu> wrote:
Gabor, To play devil's advocate a bit, why not just have the package formally import the functions it wants to use (or the whole package if that is easier)?
This is exactly my goal. And to facilitate this, R CMD check could remind you if you forgot to do the formal import. You don't want to require people to import things like the assignment
operator, or "if", or a bunch of other things in the base package (and probably not the stuff in grDevices either, though from your description they would in principle need to do that now).
I am not talking about the 'base' package, only the other base packages. (The base package is special, it is searched before the attached packages, so it is probably fine.) Yes, I suggest people import stuff from grDevices explicitly. Because if they don't, their package might break if used together with other packages. And this is not a theoretical issue, in the past couple of weeks I have seen this happening three times. But why should stats not require an import if you want to guarantee that
you get the density function from stats and not from somewhere else? Isn't that what ImportFrom is for? Is the reason that it is loaded automatically?
That's exactly what I am saying, sorry if it was not clear. I want to "require" format imports from base packages (except from _the_ base package, maybe). Yes, people can do this already. But why not help them with a NOTE if they don't know that this is good practice, or they just simply forget? Gabor
Best, ~G On Wed, Apr 29, 2015 at 10:00 AM, G?bor Cs?rdi <csardi.gabor at gmail.com> wrote:
On Wed, Apr 29, 2015 at 12:53 PM, Winston Chang <winstonchang1 at gmail.com> wrote:
On Tue, Apr 28, 2015 at 3:04 PM, G?bor Cs?rdi <csardi.gabor at gmail.com> wrote:
E.g. if package 'ggplot2' uses 'stats::density()', and package
'igraph'
also defines 'density()', and 'igraph' is on the search path, then 'ggplot2' will call 'igraph::density()' instead of 'stats::density()'.
Just to be clear: you mean that this happens when ggplot2 contains a call like 'density()', not 'stats::density()' (but the intention is to call stats::density()), right?
Yes, exactly as you say, I am sorry for the confusion. This is actually a real example: https://github.com/hadley/ggplot2/issues/1041 Gabor
-Winston
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Gabriel Becker, PhD Computational Biologist Bioinformatics and Computational Biology Genentech, Inc.
On 29/04/2015 1:57 PM, G?bor Cs?rdi wrote:
On Wed, Apr 29, 2015 at 1:45 PM, Gabriel Becker <gmbecker at ucdavis.edu> wrote:
Gabor, To play devil's advocate a bit, why not just have the package formally import the functions it wants to use (or the whole package if that is easier)?
This is exactly my goal. And to facilitate this, R CMD check could remind you if you forgot to do the formal import. You don't want to require people to import things like the assignment
operator, or "if", or a bunch of other things in the base package (and probably not the stuff in grDevices either, though from your description they would in principle need to do that now).
I am not talking about the 'base' package, only the other base packages. (The base package is special, it is searched before the attached packages, so it is probably fine.) Yes, I suggest people import stuff from grDevices explicitly. Because if they don't, their package might break if used together with other packages. And this is not a theoretical issue, in the past couple of weeks I have seen this happening three times. But why should stats not require an import if you want to guarantee that
you get the density function from stats and not from somewhere else? Isn't that what ImportFrom is for? Is the reason that it is loaded automatically?
That's exactly what I am saying, sorry if it was not clear. I want to "require" format imports from base packages (except from _the_ base package, maybe). Yes, people can do this already. But why not help them with a NOTE if they don't know that this is good practice, or they just simply forget?
I suspect the reason for this is historical: at the time that the current warning was added, it would have flagged too many packages as problematic. People do complain when base R makes changes that force them to change their packages. Perhaps that decision should be reconsidered now. Duncan Murdoch
All, Here are two ideas on this: 1. have R CMD check show how every external function reference gets resolved. 2. have R CMD check warn anytime there is a potential name conflict, e.g. density( ) coming from either igraph or stats, and show how it was resolved. Either could be an option. I guess there are potential problems with conditional loading of libraries where a function could be resolved differently depending on some decision made at run time, but 1. or 2. might be useful for the standard case. John .............................................................. John P. Nolan Math/Stat Department 227 Gray Hall, American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpnolan at american.edu voice: 202.885.3140 web: academic2.american.edu/~jpnolan .............................................................. From: G?bor Cs?rdi <csardi.gabor at gmail.com> To: Gabriel Becker <gmbecker at ucdavis.edu>, Cc: "r-devel at r-project.org" <r-devel at r-project.org> Date: 04/29/2015 01:57 PM Subject: Re: [Rd] R CMD check and missing imports from base packages Sent by: "R-devel" <r-devel-bounces at r-project.org> On Wed, Apr 29, 2015 at 1:45 PM, Gabriel Becker <gmbecker at ucdavis.edu> wrote:
Gabor, To play devil's advocate a bit, why not just have the package formally import the functions it wants to use (or the whole package if that is easier)?
This is exactly my goal. And to facilitate this, R CMD check could remind you if you forgot to do the formal import. You don't want to require people to import things like the assignment
operator, or "if", or a bunch of other things in the base package (and probably not the stuff in grDevices either, though from your description they would in principle need to do that now).
I am not talking about the 'base' package, only the other base packages. (The base package is special, it is searched before the attached packages, so it is probably fine.) Yes, I suggest people import stuff from grDevices explicitly. Because if they don't, their package might break if used together with other packages. And this is not a theoretical issue, in the past couple of weeks I have seen this happening three times. But why should stats not require an import if you want to guarantee that
you get the density function from stats and not from somewhere else?
Isn't
that what ImportFrom is for? Is the reason that it is loaded
automatically?
That's exactly what I am saying, sorry if it was not clear. I want to "require" format imports from base packages (except from _the_ base package, maybe). Yes, people can do this already. But why not help them with a NOTE if they don't know that this is good practice, or they just simply forget? Gabor
Best, ~G On Wed, Apr 29, 2015 at 10:00 AM, G?bor Cs?rdi <csardi.gabor at gmail.com> wrote:
On Wed, Apr 29, 2015 at 12:53 PM, Winston Chang
<winstonchang1 at gmail.com>
wrote:
On Tue, Apr 28, 2015 at 3:04 PM, G?bor Cs?rdi
<csardi.gabor at gmail.com>
wrote:
E.g. if package 'ggplot2' uses 'stats::density()', and package
'igraph'
also defines 'density()', and 'igraph' is on the search path, then 'ggplot2' will call 'igraph::density()' instead of
'stats::density()'.
Just to be clear: you mean that this happens when ggplot2 contains a call like 'density()', not 'stats::density()' (but the intention is
to
call stats::density()), right?
Yes, exactly as you say, I am sorry for the confusion. This is actually
a
real example: https://github.com/hadley/ggplot2/issues/1041 Gabor
-Winston
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Gabriel Becker, PhD Computational Biologist Bioinformatics and Computational Biology Genentech, Inc.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On Wed, Apr 29, 2015 at 3:28 PM, John Nolan <jpnolan at american.edu> wrote:
[...]
1. have R CMD check show how every external function reference gets resolved.
That's not possible, because it depends on the currently attached packages, and even on the order of their attachment.
2. have R CMD check warn anytime there is a potential name conflict, e.g. density( ) coming from either igraph or stats, and show how it was resolved.
Also not possible in practice, because it would need to check too many packages. But it would not be a good solution in my opinion, anyway, because you probably don't want people having to deal with these issues. I surely don't. For the developer of a single package, it should be completely irrelevant how many other packages use the same function names. Gabor [...]
On Wed, Apr 29, 2015 at 3:24 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote: [...]
Yes, people can do this already. But why not help them with a NOTE if they
don't know that this is good practice, or they just simply forget?
I suspect the reason for this is historical: at the time that the current warning was added, it would have flagged too many packages as problematic. People do complain when base R makes changes that force them to change their packages. Perhaps that decision should be reconsidered now.
Thanks for considering this. I definitely think that with the number of packages increasing, sooner or later this will be a (more) serious issue. In some cases it is also hard(er) to work around with detaching and re-attaching packages. I think a NOTE would be probably enough, because probably a lot of packages have these issues. Also, people would not need to change their package _now_. Only if and when they submit new versions. But that's CRAN's decision, not mine. Gabor [...]
On 04/28/2015 01:04 PM, G?bor Cs?rdi wrote:
When a symbol in a package is resolved, R looks into the package's environment, and then into the package's imports environment. Then, if the symbol is still not resolved, it looks into the base package. So far so good. If still not found, it follows the 'search()' path, starting with the global environment and then all attached packages, finishing with base and recommended packages. This can be a problem if a package uses a function from a base package, but it does not formally import it via the NAMESPACE file. If another package on the search path also defines a function with the same name, then this second function will be called. E.g. if package 'ggplot2' uses 'stats::density()', and package 'igraph' also defines 'density()', and 'igraph' is on the search path, then 'ggplot2' will call 'igraph::density()' instead of 'stats::density()'.
stats::density() is an S3 generic, so igraph would define an S3 method, right? And in general a developer would avoid masking a function in a base package, so as not to require the user to distinguish between stats::density() and igraph::density(). Maybe the example is not meant literally. Being able to easily flag non-imported, non-base symbols would definitely improve the robustness of package code, even if not helping the end user disambiguate duplicate symbols. Martin Morgan
I think that for a better solution, either 1) the search path should not be used at all to resolve symbols in packages, or 2) only base packages should be searched. I realize that this is something that is not easy to change, especially 1) would break a lot of packages. But maybe at least 'R CMD check' could report these cases. Currently it reports missing imports for non-base packages only. Is it reasonable to have a NOTE for missing imports from base packages as well? [As usual, please fix me if I am missing or misunderstood something.] Thank you, Best, Gabor [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
On Wed, Apr 29, 2015 at 4:24 PM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
On 04/28/2015 01:04 PM, G?bor Cs?rdi wrote:
[...]
E.g. if package 'ggplot2' uses 'stats::density()', and package 'igraph'
also defines 'density()', and 'igraph' is on the search path, then 'ggplot2' will call 'igraph::density()' instead of 'stats::density()'.
stats::density() is an S3 generic, so igraph would define an S3 method, right? And in general a developer would avoid masking a function in a base package, so as not to require the user to distinguish between stats::density() and igraph::density(). Maybe the example is not meant literally.
Yes, maybe this is not the best example. I would not go into the details here, because they are not really important for the current discussion.
Being able to easily flag non-imported, non-base symbols would definitely improve the robustness of package code, even if not helping the end user disambiguate duplicate symbols.
In 'non-base' I assume you mean the single base package. Just to address the second part, I think the end-user does not have to deal with this. (Unless it is buggy like it is now.) The fact that ggplot2 calls density() is not the user's concern. Gabor [...]
And in general a developer would avoid masking a function in a base package, so as not to require the user to distinguish between stats::density() and igraph::density(). Maybe the example is not meant literally.
The 'filter' function in the popular 'dplyr' package masks the one that has been in the stats package forever, and they have nothing in common, so that may give you an example. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Apr 29, 2015 at 1:24 PM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
On 04/28/2015 01:04 PM, G?bor Cs?rdi wrote:
When a symbol in a package is resolved, R looks into the package's environment, and then into the package's imports environment. Then, if the symbol is still not resolved, it looks into the base package. So far so good. If still not found, it follows the 'search()' path, starting with the global environment and then all attached packages, finishing with base and recommended packages. This can be a problem if a package uses a function from a base package, but it does not formally import it via the NAMESPACE file. If another package on the search path also defines a function with the same name, then this second function will be called. E.g. if package 'ggplot2' uses 'stats::density()', and package 'igraph' also defines 'density()', and 'igraph' is on the search path, then 'ggplot2' will call 'igraph::density()' instead of 'stats::density()'.
stats::density() is an S3 generic, so igraph would define an S3 method, right? And in general a developer would avoid masking a function in a base package, so as not to require the user to distinguish between stats::density() and igraph::density(). Maybe the example is not meant literally. Being able to easily flag non-imported, non-base symbols would definitely improve the robustness of package code, even if not helping the end user disambiguate duplicate symbols. Martin Morgan
I think that for a better solution, either
1) the search path should not be used at all to resolve symbols in
packages, or
2) only base packages should be searched.
I realize that this is something that is not easy to change, especially 1)
would break a lot of packages. But maybe at least 'R CMD check' could
report these cases. Currently it reports missing imports for non-base
packages only. Is it reasonable to have a NOTE for missing imports from
base packages as well?
[As usual, please fix me if I am missing or misunderstood something.]
Thank you, Best,
Gabor
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On 04/29/2015 05:38 PM, William Dunlap wrote:
And in general a developer would avoid masking a function in a base package, so as not to require the user to distinguish between stats::density() and igraph::density(). Maybe the example is not meant literally.
The 'filter' function in the popular 'dplyr' package masks the one that has been in the stats package forever, and they have nothing in common, so that may give you an example.
As I recall, several packages mask the simulate generic in stats, if you are looking for examples. Paul Gilbert
Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Apr 29, 2015 at 1:24 PM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
On 04/28/2015 01:04 PM, G?bor Cs?rdi wrote:
When a symbol in a package is resolved, R looks into the package's environment, and then into the package's imports environment. Then, if the symbol is still not resolved, it looks into the base package. So far so good. If still not found, it follows the 'search()' path, starting with the global environment and then all attached packages, finishing with base and recommended packages. This can be a problem if a package uses a function from a base package, but it does not formally import it via the NAMESPACE file. If another package on the search path also defines a function with the same name, then this second function will be called. E.g. if package 'ggplot2' uses 'stats::density()', and package 'igraph' also defines 'density()', and 'igraph' is on the search path, then 'ggplot2' will call 'igraph::density()' instead of 'stats::density()'.
stats::density() is an S3 generic, so igraph would define an S3 method, right? And in general a developer would avoid masking a function in a base package, so as not to require the user to distinguish between stats::density() and igraph::density(). Maybe the example is not meant literally. Being able to easily flag non-imported, non-base symbols would definitely improve the robustness of package code, even if not helping the end user disambiguate duplicate symbols. Martin Morgan
I think that for a better solution, either
1) the search path should not be used at all to resolve symbols in
packages, or
2) only base packages should be searched.
I realize that this is something that is not easy to change, especially 1)
would break a lot of packages. But maybe at least 'R CMD check' could
report these cases. Currently it reports missing imports for non-base
packages only. Is it reasonable to have a NOTE for missing imports from
base packages as well?
[As usual, please fix me if I am missing or misunderstood something.]
Thank you, Best,
Gabor
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On Wed, Apr 29, 2015 at 8:12 PM, Paul Gilbert <pgilbert902 at gmail.com> wrote:
As I recall, several packages mask the simulate generic in stats, if you are looking for examples.
FWIW, here is a list of base* functions masked** by CRAN packages: https://github.com/gaborcsardi/rfunctions/blob/master/rfunctions.md Look at the long table in the end. simulate indeed comes up 16 times. (But read ** below.) * This only includes functions in base packages that are attached by default. ** The CRAN function names were taken from the manual page aliases, so a couple of them are not really function names, and some of them are probably not masking anything, but defining an S3 generic. This was a lot simpler than going over the namespaces of all CRAN packages. It seems that several base functions are masked, by several packages. I would need to get a better CRAN function list, though. Gabor [...]
Paul Gilbert
G?bor Cs?rdi <csardi.gabor at gmail.com>
on Wed, 29 Apr 2015 23:07:09 -0400 writes:
> On Wed, Apr 29, 2015 at 8:12 PM, Paul Gilbert <pgilbert902 at gmail.com> wrote:
>>
>> As I recall, several packages mask the simulate generic in stats, if you
>> are looking for examples.
>>
> FWIW, here is a list of base* functions masked** by CRAN packages:
> https://github.com/gaborcsardi/rfunctions/blob/master/rfunctions.md
> Look at the long table in the end. simulate indeed comes up 16 times. (But
> read ** below.)
> * This only includes functions in base packages that are attached by
> default.
> ** The CRAN function names were taken from the manual page aliases, so a
> couple of them are not really function names, and some of them are probably
> not masking anything, but defining an S3 generic. This was a lot simpler
> than going over the namespaces of all CRAN packages.
> It seems that several base functions are masked, by several packages. I
> would need to get a better CRAN function list, though.
Thank you Gabor.
As Martin Morgan observed, many of these will be methods, S3 and
S3 and there is often no conflict and no masking at all for all
these, so I am claiming that your "statistics" are quite a bit
biased, and probably most of the more common names are proper
methods.
There is another issue, I've wanted to raise,
exemplified by my Rmpfr package [interface to the MPFR C library
for high-accuracy floating point computations]:
As it defines "mpfr"-ified versions of standard R functions
many of which are *not* generic, it must mask them.
E.g, currently, the following "are masked" messages (when
attaching Rmpfr) are unavoidable
Attaching package: ?Rmpfr?
The following objects are masked from ?package:stats?:
dbinom, dnorm, dpois, pnorm
The following objects are masked from ?package:base?:
pmax, pmin
(Currently, there are also 'cbind' and 'rbind' being masked in
order to work with mpfr-number matrices ... but that should in
principle be avoidable)
I think these "... are masked" messages are still appropriate: After all,
pnorm() will really be taken from Rmpfr instead of from stats,
and the user should know that she is trusting the author of
Rmpfr (me) to do the right thing, when she calls pnorm(), namely
to call the equivalent stats::pnorm() for standard numbers, and
to call the MPFR-library code for mpfr-numbers.
- - - - - - - - - - - - - - - - - - - - - - - -
Now the above may have almost nothing to do with the original
subject. If I have understood your main point correctly, you are
suggesting that 'R CMD check' should start putting out a NOTE
when package code calls a function from one of a set of
"standard packages" (*) and the package author failed to use an
importFrom(<standard pkg>, <function>)
in his/her NAMESPACE.
I agree that this would be useful.
Actually, I think we have something like this in place already...
but maybe not strictly enough (?)
(*) IIUC, you suggested to use
"standard packages" := packages which are attached by default
in R, apart from package 'base' because that does come
immediately after the imports anyway (and because you cannot
explicity import from base).
Martin Maechler
ETH Zurich