Increase transparency: suggestion on how to avoid namespaces and/or unnecessary overwrites of existing functions

On Sat, Oct 1, 2011 at 1:08 PM, Duncan Murdoch<murdoch.duncan at gmail.com>  wrote:
On 11-08-23 2:23 PM, Janko Thyson wrote:
aDear list,

I'm aware of the fact that I posted on something related a while ago,
but I just can't sweat this off and would like to ask your for an opinion:

The problem:
Namespaces are great, but they don't resolve certain conflicts regarding
name clashes. There are more and more people out there trying to come up
with their own R packages, which is great also! Yet, it becomes more and
more likely that programmers will choose identical names for their
exported functions and/or that they add functionality to existing
function (i.e. overwriting existing functions).
The whole process of which packages overwrite which functions is
somewhat obscure and in addition depends on their order in the search
path. On the other hand, it is not possible to use "namespace"
functionality (i.e. 'namespace::fun()'; also less efficient than direct
call; see illustration below) during early stages of the development
process (i.e. the package is not finished yet) as there is no namespace
available yet.

I agree there can be a problem, but I don't think it is necessarily as
serious as you suggest.  Even though there are more and more packages
available, most people will still use roughly the same number of them. Just
because CRAN has thousands of packages doesn't mean I use all of them at the
same time.

I know of at least two cases where such overwrites (I think it's called
masking, right?) led to some confusion at our chair:
1) loading package forecast overwrites certain functions in stats which
made some code refactoring necessary
If your code had been in a package with a NAMESPACE, it would not have been
affected by a user loading forecast.  (If you start importing it, then of
course it could cause masking problems.)

You suggest above that users only put code into a package very late in the
development process.  The solution is, don't do that.  Create a package
early on, and use it through the majority of development time.

You can leave the choice of exports until late by exporting everything;
you'll still get the benefit of the more controlled name search from the
beginning.

You say you can't use "namespace::call()" until the namespace package has
been written.  But why would you want to?  If the call is coming from the
new package, objects in it will be used with first priority in resolving the
call.  You only need the :: notation when there are ambiguities in calls to
external packages.

2) loading package 'R.utils' followed by package 'roxygen' overwrites
'parse.default()' which results in errors for something like
'eval(parse(text="a<- 1"))' ; see illustration below)
And I'm sure the community could come up with lots more of such scenarios.

Suggestions:
1) In order to avoid name clashes/unintended overwrites, how about
switching to a coding paradigm that explicitly (and automatically)
includes a package's name in all its functions' names once code is
turned into a real package? E.g., getting used to "preemptively" type
'package_fun()' or 'package.fun()' instead of just 'fun()'. Better to be
save than sorry, right? This could be realized pretty easily (see
example below) and, IMHO, would significantly increase transparency.
I think long names with consistent prefixes are harder to read than short
descriptive names.  I think this would make code harder to read. For
example, the first few lines of mean.default would change from

    if (!is.numeric(x)&&  !is.complex(x)&&  !is.logical(x)) {
        warning("argument is not numeric or logical: returning NA")
        return(NA_real_)
    }

to

    if (!base_is.numeric(x)&&  !base_is.complex(x)&&
        !base_is.logical(x)) {
        base_warning("argument is not numeric or logical: returning NA")
        return(base_NA_real_)
    }

2) In order to avoid intended (but for the user often pretty obscure)
overwrites of existing functions, we could use the same mechanism
together with the "rule": just don't provide any functions that
overwrite existing ones, rather prepend your version of that function
with your package name and leave it up to the user which version he
wants to call.
That seems like good advice.

Duncan Murdoch
Except that namespace::foo should be assigned to another local
variable instead of using package::foo in a tight loop, because
repeated calls to "::" can introduce a significant performance
penalty. (This has been discussed in another thread.)
That's good advice too.

Duncan Murdoch

At the moment, all of this is probably not that big of a deal yet, but
my suggestion has more of a mid-term/long-term character.

Below you find a little illustration. I'm probably asking too much, but
it'd be great if we could get a little discussion going on how to
improve the way of loading packages!

Best regards and thanks for R and all it's packages!
Janko

################################################################################
# PROOF OF CONCEPT

################################################################################

# 1) PROBLEM
# IMHO, with the number of packages submitted to CRAN constantly
increasing,
# over time we will be likely to see problems with respect to name
clashes.
# The main reasons I see for this are the following:
# a) package developers picking identical names for their exported
functions
# b) package developers overwriting base functions in order to add
functionality
#    to existing functions
# c) ...
#
# This can create scenarios in which the user might not exactly know that
# he/she is using a 'modified' version of a specific function. More so,
the user
# needs to carefully read the description of each new package he plans
# to use in order to find out which functions are exported and which
existing
# functions might be overwritten. This in turn might imply that the user's
# existing code needs to be refactored (i.e. instead of using 'fun()' it
# might now be necessary to type 'namespace::fun()' to be sure that the
desired
# function is called).

# 2) SUGGESTED SOLUTION
# That being said, why don't we switch to a 'preemptive' coding paradigm
# where the default way of calling functions includes the specification of
# its namespace? In principle, the functionality offered by
'namespace::fun()'
# gets the job done.
# BUT:
# a) it is slower compared to the direct way of calling a function.
#    (see illustration below).
# b) this option is not available througout the development process of a
package
#    as there is no namespace yet and there's no way to emulate one.
This in
#    turn means that even though a package developer would buy into
strictly
#    using 'mypkg::fun()' throughout his package code, he can only do so
at the
#    very final stage of the process RIGHT before turning his code into a
#    working package (when he's absolutely sure everything is working as
planned).
#    For debugging he would need to go back to using 'fun()'. Pretty
cumbersome.

# So how about simply automatically prepending a given function's name
with
# the package's name for each package that is build (e.g. 'pkg.fun()' or
# 'pkg_fun()')? In the end, this would just be a small change for new
packages
# without a significant decrease of performance and it could also be
realized
# at early stages of the development process (see illustration below).

# 3) ILLUSTRATION

# Example case where base function 'parse.default' is overwritten:
parse(text="a<- 5")    # Works
require(R.utils)
require(roxygen)
parse(text="a<- 5")    # Does not work anymore

################# START A NEW R SESSION BEFORE YOU CONTINUE
####################

# Inefficiency of 'namespace::fun()':
require(microbenchmark)
res.a<- microbenchmark(eval(parse(text="a<- 5")))
res.b<- microbenchmark(eval(base::parse(text="a<- 5")))
median(res.a$time)/median(res.b$time)

# Can be made up by explicit assignment:
foo<- base::parse
res.a<- microbenchmark(eval(parse(text="a<- 5")))
res.b<- microbenchmark(eval(foo(text="a<- 5")))
median(res.a$time)/median(res.b$time)

# Automatically prepend function names:
processNamespaces<- function(
      do.global=FALSE,
      do.verbose=FALSE,
      .delim.name="_",
      ...
){
      srch.list.0<- search()
      srch.list<- gsub("package:", "", srch.list.0)
      if(!do.global){
          assign(".NS", new.env(), envir=.GlobalEnv)
      }
      out<- lapply(1:length(srch.list), function(x.pkg){
          pkg<- srch.list[x.pkg]

          # SKIP LIST
          if(pkg %in% c(".GlobalEnv", "Autoloads")){
              return(NULL)
          }
          # /

          # TARGET ENVIR
          if(!do.global){
              # ADD PACKAGE TO .NS ENVIRONMENT
              envir<- eval(substitute(
                  assign(PKG, new.env(), envir=.NS),
                  list(PKG=pkg)
              ))
              # /
#            envir<- get(pkg, envir=.NS, inherits=FALSE)
              envir.msg<- paste(".NS$", pkg, sep="")
          } else {
              envir<- .GlobalEnv
              envir.msg<- ".GlobalEnv"
          }
          # /

          # PROCESS FUNCTIONS
          cnt<- ls(pos=x.pkg)
          out<- unlist(sapply(cnt, function(x.cnt){
              value<- get(x.cnt, pos=x.pkg, inherits=FALSE)
              obj.mod<- paste(pkg, x.cnt, sep=.delim.name)
              if(!is.function(value)){
                  return(NULL)
              }
              if(do.verbose){
                  cat(paste("Assigning '", obj.mod, "' to '", envir.msg,
                      "'", sep=""), sep="\n")
              }
              eval(substitute(
                  assign(OBJ.MOD, value, envir=ENVIR),
                  list(
                      OBJ.MOD=obj.mod,
                      ENVIR=envir
                  )
              ))
              return(obj.mod)
          }))
          names(out)<- NULL
          # /
          return(out)
      })
      names(out)<- srch.list
      return(out)
}

# +++++

funs<- processNamespaces(do.verbose=TRUE)
ls(.NS)
ls(.NS$base)
.NS$base$base_parse

res.a<- microbenchmark(eval(parse(text="a<- 5")))
res.b<- microbenchmark(eval(.NS$base$base_parse(text="a<- 5")))
median(res.a$time)/median(res.b$time)

#+++++

funs<- processNamespaces(do.global=TRUE, do.verbose=TRUE)
base_parse

res.a<- microbenchmark(eval(parse(text="a<- 5")))
res.b<- microbenchmark(eval(base_parse(text="a<- 5")))
median(res.a$time)/median(res.b$time)

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Increase transparency: suggestion on how to avoid namespaces and/or unnecessary overwrites of existing functions

Thread (6 messages)