[Rcpp-devel] must .Call C functions return SEXP?

See comments on Rcpp below.

On Thu, Oct 28, 2010 at 11:28 AM, William Dunlap <wdunlap at tibco.com> wrote:

-----Original Message-----
From: r-devel-bounces at r-project.org
[mailto:r-devel-bounces at r-project.org] On Behalf Of Andrew Piskorski
Sent: Thursday, October 28, 2010 6:48 AM
To: Simon Urbanek
Cc: r-devel at r-project.org
Subject: Re: [Rd] must .Call C functions return SEXP?

On Thu, Oct 28, 2010 at 12:15:56AM -0400, Simon Urbanek wrote:

Reason I ask, is I've written some R code which allocates two long
lists, and then calls a C function with .Call. ?My C code
writes to
those two pre-allocated lists,

That's bad! All arguments are essentially read-only so you should
never write into them!
I don't see how. ?(So, what am I missing?) ?The R docs themselves
state that the main point of using .Call rather than .C is that .Call
does not do any extra copying and gives one direct access to the R
objects. ?(This is indeed very useful, e.g. to reorder a large matrix
in seconds rather than hours.)

I could allocate the two lists in my C code, but so far it was more
convenient to so in R. ?What possible difference in behavior can there
be between the two approaches?
Here is an example of how you break the rule that R-language functions
do not change their arguments if you use .Call in the way that you
describe. ?The C code is in alter_argument.c:

#include <R.h>
#include <Rinternals.h>

SEXP alter_argument(SEXP arg)
{
? ?SEXP dim ;
? ?PROTECT(dim = allocVector(INTSXP, 2));
? ?INTEGER(dim)[0] = 1 ;
? ?INTEGER(dim)[1] = LENGTH(arg) ;
? ?setAttrib(arg, R_DimSymbol, dim);
? ?UNPROTECT(1) ;
? ?return dim ;
}

Make a shared library out of this. ?E.g., on Linux do
? ?R CMD SHLIB -o Ralter_argument.so alter_argument.so
and load it into R with
? ?dyn.open("./Ralter_argument.so")
(Or, on any platform, put it into a package along with
the following R code and build it.)

The associated R code is
? ? myDim <- function(v).Call("alter_argument", v)
? ? f <- function(z) myDim(z)[2]
Now try using it:
? ? > myData <- 6:10
? ? > myData
? ? [1] ?6 ?7 ?8 ?9 10
? ? > f(myData)
? ? [1] 5
? ? > myData
? ? ? ? ?[,1] [,2] [,3] [,4] [,5]
? ? [1,] ? ?6 ? ?7 ? ?8 ? ?9 ? 10
The argument to f was changed! ?This should never happen in R.

If you are very careful you might be able ensure that
no part of the argument to be altered can come from
outside the function calling .Call(). ?It can be tricky
to ensure that, especially when the argument is more complicated
than an atomic vector.

"If you live outside the law you must be honest" - Bob Dylan.
This thread seems to suggest (following Bob Dylan) that one needs
to be very careful when using C/C++ to modify R's memory
directly, because you may modify other R variables that point
to the same memory (due to R's copy-by-value semantics and
optimizations).

What are the implications for the Rcpp package where R
objects are exposed to the C++ side in precisely this way,
permitting unrestricted modifications? (In the original
or "classic" version of this package direct writes to R's
memory were done only for performance reasons.)

Seems like extra precautions need to be taken to
avoid the aliasing problem.
The current Rcpp facilities has the same benefits and dangers as the C
macros used in .Call.  You get access to the memory of the R object
passed as an argument, saving a copy step.  You shouldn't modify that
memory.  If you do, bad things can happen and they will be your fault.
 If you want to get a read-write copy you clone the argument (in Rcpp
terminology).

To Bill:  I seem to remember the Dylan quote as "To live outside the
law you must be honest."
Dominick

In R, .Call() does not copy its arguments but the C code
writer is expected to do so if they will be altered.
In S+ (and S), .Call() copies the arguments if altering
them would make a user-visible change in the environment,
unless you specify that the C code will not be altering them.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

R has pass-by-value(!) semantics, so semantically you code has
nothing to do with the result.1 and result.2 variables since only
their *values* are guaranteed to be passed (possibly a copy).
Clearly C code called from .Call must be allowed to construct R
objects, as that's how much of R itself is implemented, and further
down, it's what you recommend I should do instead.

But why does it follow that C code must never modify an object
initially allocated by R code? ?Are you saying there is some special
magic difference in the state of an object allocated by R's C code
vs. one allocated by R code? ?If so, what is it?

What is the potential problem here, that the garbage collector will
suddenly run while my C code is in the middle of writing to an R list?
Yes, if the gc is going to move the object elsewhere, that would be
very bad. ?But it looks to me like that cannot happen, because lots of
the R implementation itself would fail badly if it did.

E.g.: ?The PROTECT call is used to increment reference counts, but I
see no guarantees that it is atomic with the operations that allocate
objects. ?I see no mutexes or other barriers in C code to prevent the
gc from running, thus implying that it *can't* run until the C
function completes.

And R is single threaded, of course. ?But what about signal handlers,
could they ever invoke R's gc?

Also, I was initially surprised not to find any matrix C APIs, but
grepping for examples (sorry, I don't remember exactly which
functions) showed me that the apparently accepted way to do matrix
operations from C is to simply assume R's column-first dense matrix
order, and access the 2D matrix as a flat 1D vector. ?(Which is easy.)

The fact that internally R attempts to avoid copying for performance
reasons is the only reason why your code may have appeared to work,
but it's invalid!
I will probably change my code to allocate a new list from the C code
and return that, as you recommend. ?My main reason for doing the
allocation in R was just that it was simpler, especially given the
very limited documentation of R's C API.

But, I didn't see anything in the "Writing R Extensions" doc saying
that what my code is doing is "invalid", and more importantly, I don't
see why it would or should be invalid...

I'd still like to better understand why you think doing the initial
allocation of an object in R rather than C code is such a problem. ?So
far, I don't see any way that the R interpreter could ever tell the
difference.

Wait, or is the only objection here that I'm using C in a way that
makes pass-by-reference semantics visible to my R code? ?Which will
work completely correctly, but is not the The Proper R Way?

I don't actually need pass-by-reference behavior here at all, but I
can imagine cases where I might want it, so I'd like to understand
your objections better. ?Is using C to implement pass-by-reference
actually Broken, or merely Ugly? ?From my reasons above, I think it
will always work correctly and thus is not Broken. ?But of course
given R's devotion to pass-by-value, it could be considered
unacceptably Ugly.

--
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com/

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

_______________________________________________
Rcpp-devel mailing list
Rcpp-devel at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

[Rcpp-devel] must .Call C functions return SEXP?

Thread (10 messages)