R's copying of arguments (Re: Julia)

oliver
Sat, Mar 17, 2012 8:35 AM
Hello,

regarding the copying issue,
I would like to point to the 

"Writing R-Extensions" documentation.

There it is mentio9ned, that functions of extensions
that use the .C interface normally do get their arguments
pre-copied...


In section 5.2:

  "There can be up to 65 further arguments giving R objects to be
  passed to compiled code. Normally these are copied before being
  passed in, and copied again to an R list object when the compiled
  code returns."

But for the .Call and .Extension interfaces this is NOT the case.



In section 5.9:
  "The .Call and .External interfaces allow much more control, but
  they also impose much greater responsibilities so need to be used
  with care. Neither .Call nor .External copy their arguments. You
  should treat arguments you receive through these interfaces as
  read-only."


Why is read-only preferred?

Please, see the discussion in section 5.9.10.

It's mentioned there, that a copy of an object in the R-language
not necessarily doies a real copy of that object, but instead of
this, just a "rerference" to the real data is created (two names
referring to one bulk of data). That's typical functional
programming: not a variable, but a name (and possibly more than one
name) bound to an object.


Of course, if yo change the orgiginal named value, when there
would be no copy of it, before changing it, then both names
would refer to the changed data.
of course that is not, what is wanted.

But what you also can see in section 5.9.10 is, that
there already is a mechanism (reference counting) that allows
to distinguish between unnamed and named object.

So, this is directly adressing the points you have mentioned in your
examples.

So, at least in principial, R allows to do in-place modifications
of object with the .Call interface.

You seem to refer to the .C interface, and I had explored the .Call
interface. That's the reason why you may insist on "it's copyied
always" and I wondered, what you were talking about, because the
.Call interface allowed me rather C-like raw style of programming
(and the user of it to decide, if copying will be done or not).

The mechanism to descide, if copying should be done or not,
also is mentioined in section 5.9.10: NAMED and SET_NAMED macros.
with NAMED you can get the number of references.

But later in that section it is mentioned, that - at least for now -
NAMED always returns the value 2.


  "Currently all arguments to a .Call call will have NAMED set to 2,
  and so users must assume that they need to be duplicated before
  alteration."
               (section 5.9.10, last sentence)


So, the in-place modification can be done already with the .Call
intefcae for example. But the decision if it is safe or not
is not supported at the moment.

So the situation is somewhere between: "it is possible" and
"R does not support a safe decision if, what is possible, also
can be recommended".
At the moment R rather deprecates in-place modification by default
(the save way, and I agree with this default).

But it's not true, that R in general copies arguments.

But this seems to be true for the .C interface.

Maybe a lot of performance-/memory-problems can be solved
by rewriting already existing packages, by providing them
via .Call instead of .C.


Ciao,
   Oliver
On Tue, Mar 06, 2012 at 04:44:49PM +0000, William Dunlap wrote:
R's copying of arguments (Re: Julia)

Thread (34 messages)