Skip to content

R package dependency issues when namespace is not attached

10 messages · Duncan Murdoch, Uwe Ligges, Jeroen Ooms +2 more

#
I have always assumed that having a package in the 'Depends' field
would automatically also?import?the namespace. However, it seems that
in R 2.15, dependencies do not become available until the package is
actually?attached?to the searchpath. Is this intended behavior?

The problem appears as follows: Suppose there is a package 'Child'
which?Depends, but does not explicitly import?a package called
'Parent' and contains a function that calls out to an object in the
namespace of 'Parent'. When this function is called without attaching
'Child' to the search path, the function in 'Parent' cannot be found.

Here an example from the manual of the?bigdata?package, but the
problem is very widespread:

x = matrix(rnorm(50*80),50,80)
beta = c(3,2,1.5,rep(0,77))
y = rnorm(50) + x%*%beta
z1 = bigdata::lasso.stars(x,y)

The example fails because lasso.stars depends on 'glmnet' which is not
loaded until?bigdata?is attached. The only way to be able to call
lasso.stars?is to actually attach the?bigdata?package:

library(bigdata)
z1 = bigdata::lasso.stars(x,y)

Now to further complicate things, it seems that this problem is
inherited to any 'grandchild' package that?imports, in this case, the
lasso.stars function. I have a hard time finding a good example but I
am sure they are out there.

Is this a bug? I know that it can be avoided by asking package authors
to use Imports instead of Depends, but in practice the majority of the
packages on CRAN still use Depends. It seems like the problem is
easily fixed if R would automatically import the namespace of any
Depends packages into to the child package namespace?
#
On 12-05-13 3:15 AM, Jeroen Ooms wrote:
Not sure if it's a bug, but the correct solution in bigdata is to import 
the glmnet function in its NAMESPACE.  Then the namespace that gets 
loaded when you type bigdata::lasso.stars will be able to see the glmnet 
function.

Perhaps Depends in the DESCRIPTION file should do the import 
automatically, but it will be faster to import just one function than 
everything from a package that has a lot of exports.  So maybe it's a 
bug because we don't do that, but I think there would be complaints if 
we did.

On the other hand, if bigdata::lasso.stars loaded glmnet onto the search 
path, I think that would be a bug.  The search path belongs to the user, 
not to R, and the user might have used the :: notation to avoid messing 
with it.

Duncan Murdoch
#
On 13.05.2012 10:59, Duncan Murdoch wrote:
I do not see any problem in R. If someone is going to import a 
Namespace, he or she has to do that via import directives in the 
NAMESPACE file. If someone is going to have a package on the search 
path, he or she has to require() it. The DESCRIPTION file is used to 
derive the dependency structures among packages for installation order, 
check order etc.

Best,
Uwe
#
On Sun, May 13, 2012 at 10:14 AM, Uwe Ligges
<ligges at statistik.tu-dortmund.de> wrote:

            
I am not sure everyone is aware of this. Many package authors seem to
be assuming that having a package in the Depends field of the
DESCRIPTION is a sufficient condition for having the dependency
package available at runtime, regardless of how the function is
invoked by the user. I think this is the usual meaning of a
dependency. There are a lot of packages on CRAN that use Depends and
are not explicitly importing anything. Among others, this holds for
any package without a NAMESPACE file.

Also looking at the definition of the 'Depends' field in the 'writing
r extensions' manual there is not a single hint that Depends is not
sufficient for having the package available at runtime, and any
function that is used should still be manually imported or required()
as you suggest.
#
On Sun, May 13, 2012 at 10:14 AM, Uwe Ligges
<ligges at statistik.tu-dortmund.de> wrote:
So should package authors both list a package in the depends of
DESCRIPTION and explicitly import what is needed so if someone else
uses their code without loading the package, everything needed is
available?

  
    
#
On 12-05-13 3:14 PM, Jeroen Ooms wrote:
What do you suggest as the solution?

Duncan Murdoch
#
On 05/13/2012 12:14 PM, Jeroen Ooms wrote:
I think this is because name spaces are relatively new, so authors are 
yet to realize the consequences of not importing the definitions their 
package uses.

As a package developer, I want to have the code my package sees be 
exactly what is needed, and no more. There are many good reasons for 
this, including isolating as much as possible my code from changes in 
other packages and minimizing the costs of symbol look-up. These issues 
become increasing important as the hierarchy of package relationships 
becomes deep.

The best practice is for authors to import all necessary symbols, but no 
more!

Martin

  
    
#
On 12-05-13 4:06 PM, Martin Morgan wrote:
They aren't that new, but I think our efforts at back-compatibility have 
slowed adoption.  If we were more demanding of package developers, we 
wouldn't have this problem; but I think we'd have a lot fewer packages. 
  Even with our current policy of aiming for back-compatibility we get a 
lot of complaints that we are asking too much.
I agree, but many authors don't want to think about things that way.

Duncan Murdoch
#
On Sun, May 13, 2012 at 1:06 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
Exactly. That is why you probably don't use Depends, but Imports in
combination with a NAMESPACE file. Which is great, and we should
encourage that practice. But as long as 'Depends' is also supported,
this should be working properly as well.

Here a quote from
http://cran.r-project.org/doc/contrib/Leisch-CreatingPackages.pdf: "A
stronger form of dependency can be specified in the optional Depends
field listing packages which are necessary to run our code."

It think it seems reasonable to assume that when a package author
decides to use 'Depends' (for whatever reason), they want the
namespace to be available to their package. Hence I think R should
import the full namespace of packages in the Depends field. I don't
think this will generate too much overhead, because in most
circumstances, the package will be loaded and attached anyway.
Furthermore this will not slow down or affect packages that use the
better practice of specifying 'Imports' instead of 'Depends' and
explicitly import only required symbols.
#
On 05/13/2012 01:39 PM, Duncan Murdoch wrote:
perhaps it would be easy to provide a check (I realize this is close on 
the heels of the undefined global variables thread), along the lines of

 > library(codetools)
 > checkUsageEnv(getNamespace("bigdata"), suppressLocal=TRUE)
lasso.stars: no visible global function definition for 'glmnet'
lasso.stars: no visible global function definition for 'glmnet'

or to suggest a NAMESPACE, done imperfectly by

https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/codetoolsBioC

(username / password: readonly)

 > library(codetoolsBioC)
 > library(bigdata)
 > deps <- writeNamespaceImports("bigdata", file=stdout())
#Generated by codetoolsBioC version 0.0.16
#Timestamp: Sun May 13 15:01:12 2012

#Imports: glmnet, graphics, Matrix

importMethodsFrom(Matrix, mean, t)

importFrom(glmnet, glmnet)

importFrom(graphics, lines, par, plot)