Skip to content

Copy on assignment to large field of reference class

2 messages · Giles Percy, John Chambers

1 day later
#
This is a useful observation.  To talk about it, though, we need to re-express it in terms that make sense for R; there are too many misconceptions otherwise.

The basic observation is this:  When simple subset or element replacement is done in a loop, normally the object is only copied on the first time through the loop.  This is true whether using local assignment, <-, or global assignment, <<-.

However, if global assignment is done in a method to replace in a field, the object is copied every time.  For long loops this makes for substantial overhead.  Very relevant observation.

What's going on?

The non-copying depends on the fact that `[<-` is a primitive function.

When a field is declared with a class ("vector" in the example), its assignment is done by an R function that checks the validity (via what's called an "active binding" in R).  That causes the extra copy on each assignment.  (To be honest, I don't totally understand why, but I have no intention of messing with the active binding code.)

What to do about it?

There are two solutions; either take the attitude that field assignment is basically inefficient and don't do it in a loop, as in method modb2.

Or don't declare a class for the field, in which case no active binding is used.  Check this out by changing the class definition to setRefClass("A", fields="b").

I prefer the first solution since it retains the validity check on the field.

John


PS: A few comments.
 - it makes no sense to expect _greater_ efficiency than for a simple assignment.  The object in a$b is NOT a reference object so its manipulation obeys R's normal rules.
 - all this only applies to replacement functions that are primitives.  Otherwise you're stuck with copies each time.
 - Please don't use the term "call by value" for R; that's not how R's evaluation works and has nothing to do with when duplication takes place.  That topic is not for the faint of heart, but basically when R knows that there is only one reference to an object, it doesn't copy.  But in practice this is mainly when a primitive replacement function is used.
On May 18, 2013, at 2:50 PM, Giles Percy <giles.percy at gmail.com> wrote: