Skip to content

How do I reliably and efficiently hash a function?

8 messages · Charles C. Berry, Mark van der Loo, Hadley Wickham +2 more

#
I?ve got the following scenario: I need to store information about an
R function, and retrieve it at a later point. In other programming
languages I?d implement this using a dictionary with the functions as
keys. In R, I?d usually use `attr(f, 'some-name')`. However, for my
purposes I do not want to use `attr` because the information that I
want to store is an implementation detail that should be hidden from
the user of the function (and, just as importantly, it shouldn?t
clutter the display when the function is printed on the console).

`comment` would be almost perfect since it?s hidden from the output
when printing a function ? unfortunately, the information I?m storing
is not a character string (it?s in fact an environment), so I cannot
use `comment`.

How can this be achieved?

For reference, I?ve considered the following two alternatives:

1. Use `attr`, and override `print.function` to not print my
attribute. However, I?m wary of overriding a core function just to
implement such a little thing, and overriding this function would
obviously clash with other overrides, if somebody else happens to have
a similarly harebrain idea.

2. Use C++ to retrieve the SEXP to the body of the CLOSXP that
represents a function, and use that as a key in a dictionary. I
*think* that this robustly and efficiently identifies functions in R.
However, this relies quite heavily on R internal implementation
details, and in particular on the fact that the GC will not move
objects around in memory. The current GC doesn?t do this but G?bor
Cs?rdi rightfully pointed out to me that this might change.

On the chance that I?m trying to solve the wrong Y to an X/Y problem,
the full context to the above problem is explained in [1]. In a
nutshell, I am hooking a new environment into a function?s parent.env
chain, by re-assigning the function?s `parent.env` (naughty, I know):

```
parent.env(my_new_env) = parent.env(f)
parent.env(f) = my_new_env
```

This is done so that the function `f` finds objects defined inside
that environment without having to attach it globally. However, for
bookkeeping purposes I need to preserve the original parent
environment ? hence the question.

[1]: https://github.com/klmr/modules/issues/66
#
On Thu, 10 Dec 2015, Konrad Rudolph wrote:

            
See

https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Scope

For example, these commands:

foo <- function() {info <- "abc";function(x) x+1}
func <- foo()
find("func")
func(1)
ls(envir=environment(func))
get("info",environment(func))
func

Yield these printed results:

: [1] ".GlobalEnv"
: [1] 2
: [1] "info"
: [1] "abc"
: function (x)
: x + 1
: <environment: 0x7fbd5c86bc60>

The environment of the function gets printed, but 'info' and other
objects that might exist in that environment do not get printed unless
you explicitly call for them.

HTH,

Chuck

p.s. 'environment(func)$info' also works.
#
In addition to what Charles wrote, you can also use 'local' if you don't
want a function that creates another function.
[1] 13

best,
Mark


Op vr 11 dec. 2015 om 03:27 schreef Charles C. Berry <ccberry at ucsd.edu>:

  
  
#
Thanks. I know about `local` (and functions within functions). In
fact, the functions are *already* defined inside their own environment
(same as what `local` does). But unfortunately this doesn?t solve my
problem, since the functions? parent environment gets changed during
the function?s execution, and I need to retrieve my stored data
*after* that point, inside the function.

I?ve tried to create a more exact example of what?s going on ?
unfortunately it?s really hard to simplify the problem without losing
crucial details. Since the code is just a tad too long, I?ve posted it
as a Github Gist:

https://gist.github.com/klmr/53c9400e832d7fd9ea5c

The function `f` in the example calls `get_meta()` twice, and gets
different results before and after calling an ancillary function that
modifies the function?s `parent.env`. I want it to return the same
information (?original?) both times.

On Fri, Dec 11, 2015 at 10:49 AM, Mark van der Loo
<mark.vanderloo at gmail.com> wrote:
#
On Thu, Dec 10, 2015 at 5:49 PM, Konrad Rudolph
<konrad.rudolph+r-devel at gmail.com> wrote:
Why not use your own S3 class?

Hadley
#
On Fri, Dec 11, 2015 at 12:49 AM, Konrad Rudolph
<konrad.rudolph+r-devel at gmail.com> wrote:
Not sure if this is helpful, but you can implement this more naturally
using closures without hacking on environments. As I understand it,
your functions have some shared state, and some private. So each
function needs a private parent env, which all share a common
grand-parent that holds your shared objects. Maybe this example helps:

new_closure <- (function(shared = 0){
  function(name, priv = 0){
    function(){
      priv <<- priv +1
      shared <<- shared +1
      print(sprintf("Total:%d; %s:%d", shared, name, priv))
    }
  }
})()

fun1 <- new_closure("erik")
fun2 <- new_closure("anna")

fun1()
fun1()
fun1()
fun2()
fun1()
fun1()
fun2()
#
@Jeroen, here?s what I?m solving with my hacking the parent
environment chain: I?m essentially re-implementing `base::attach` ?
except that I?m attaching objects *locally* in the function instead of
globally. I don?t think this can be done in any way except by
modifying the parent environment chain. Incidentally, package
namespaces do largely the same thing. The difference is that they only
need to do it *once* (when loaded), and subsequent function calls do
not modify this chain.
#
On Fri, Dec 11, 2015 at 1:26 PM, Hadley Wickham <h.wickham at gmail.com> wrote:
Yes, I?ll probably do that. Thanks. I honestly don?t know why I hadn?t
thought of that before, since I?m doing the exact same thing in
another context [1].

[1]: https://github.com/klmr/decorator/blob/2742b398c841bac53acb6607a4d220aedf10c26b/decorate.r#L24-L36