Skip to content

Serializing global state from within a namespace to distribute to other workers

2 messages · Roger Bivand, Murray Stokely

#
Murray Stokely <murray at ...> writes:
...

This is only an oblique follow-up - are there any tools for finding out whether
the serialisation of an object will cause the serialisation of its environment?
I've looked around, for example in the codetools package, but do not see
anything obvious. I've also been hit by objects being serialised (both for snow
and even just for save() - which I think is the underlying mechanism here)
ending up about two orders of magnitude larger than the object.size() reported.

Roger
#
On Tue, Nov 9, 2010 at 12:50 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
The print() statement for functions will tell you if an environment is
associated with the function that will need to be serialized.

I ended up in gdb and sprinkling Rprintf's around serialize.c and
loadsave.c to try to understand this better.  In the end I changed the
assign statement to use substitute to get around the fact that a
NAMESPACE is associated with FUN no matter how hard I try to strip it
off :

parallelapply <- function(x, FUN, ...) {
 environment(FUN) <- .GlobalEnv   # does not have intended effect
 assign(".GLOBAL.FUN",
              eval.parent(substitute(function(y) { FUN(y, ...) })),
              env=.GlobalEnv)
 environment(.GLOBAL.FUN) <- .GlobalEnv   # does not have intended effect
 save(list = ls(envir = .GlobalEnv, all.names = TRUE),
      file = "/tmp/.Rdata",
 # Here we distribute the .Rdata file to other workers that load it
and then run .GLOBAL.FUN(n)
 # this works great if parallelapply is in a package without a
NAMESPACE, but fails on loadNamespace otherwise.
}

              - Murray