[Rcpp-devel] must .Call C functions return SEXP?
On Thu, Oct 28, 2010 at 1:44 PM, Dominick Samperi <djsamperi at gmail.com> wrote:
See comments on Rcpp below. On Thu, Oct 28, 2010 at 11:28 AM, William Dunlap <wdunlap at tibco.com> wrote:
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Andrew Piskorski Sent: Thursday, October 28, 2010 6:48 AM To: Simon Urbanek Cc: r-devel at r-project.org Subject: Re: [Rd] must .Call C functions return SEXP? On Thu, Oct 28, 2010 at 12:15:56AM -0400, Simon Urbanek wrote:
Reason I ask, is I've written some R code which allocates two long lists, and then calls a C function with .Call. ?My C code
writes to
those two pre-allocated lists,
That's bad! All arguments are essentially read-only so you should never write into them!
I don't see how. ?(So, what am I missing?) ?The R docs themselves state that the main point of using .Call rather than .C is that .Call does not do any extra copying and gives one direct access to the R objects. ?(This is indeed very useful, e.g. to reorder a large matrix in seconds rather than hours.) I could allocate the two lists in my C code, but so far it was more convenient to so in R. ?What possible difference in behavior can there be between the two approaches?
Here is an example of how you break the rule that R-language functions
do not change their arguments if you use .Call in the way that you
describe. ?The C code is in alter_argument.c:
#include <R.h>
#include <Rinternals.h>
SEXP alter_argument(SEXP arg)
{
? ?SEXP dim ;
? ?PROTECT(dim = allocVector(INTSXP, 2));
? ?INTEGER(dim)[0] = 1 ;
? ?INTEGER(dim)[1] = LENGTH(arg) ;
? ?setAttrib(arg, R_DimSymbol, dim);
? ?UNPROTECT(1) ;
? ?return dim ;
}
Make a shared library out of this. ?E.g., on Linux do
? ?R CMD SHLIB -o Ralter_argument.so alter_argument.so
and load it into R with
? ?dyn.open("./Ralter_argument.so")
(Or, on any platform, put it into a package along with
the following R code and build it.)
The associated R code is
? ? myDim <- function(v).Call("alter_argument", v)
? ? f <- function(z) myDim(z)[2]
Now try using it:
? ? > myData <- 6:10
? ? > myData
? ? [1] ?6 ?7 ?8 ?9 10
? ? > f(myData)
? ? [1] 5
? ? > myData
? ? ? ? ?[,1] [,2] [,3] [,4] [,5]
? ? [1,] ? ?6 ? ?7 ? ?8 ? ?9 ? 10
The argument to f was changed! ?This should never happen in R.
If you are very careful you might be able ensure that
no part of the argument to be altered can come from
outside the function calling .Call(). ?It can be tricky
to ensure that, especially when the argument is more complicated
than an atomic vector.
"If you live outside the law you must be honest" - Bob Dylan.
This thread seems to suggest (following Bob Dylan) that one needs to be very careful when using C/C++ to modify R's memory directly, because you may modify other R variables that point to the same memory (due to R's copy-by-value semantics and optimizations). What are the implications for the Rcpp package where R objects are exposed to the C++ side in precisely this way, permitting unrestricted modifications? (In the original or "classic" version of this package direct writes to R's memory were done only for performance reasons.) Seems like extra precautions need to be taken to avoid the aliasing problem.
The current Rcpp facilities has the same benefits and dangers as the C macros used in .Call. You get access to the memory of the R object passed as an argument, saving a copy step. You shouldn't modify that memory. If you do, bad things can happen and they will be your fault. If you want to get a read-write copy you clone the argument (in Rcpp terminology). To Bill: I seem to remember the Dylan quote as "To live outside the law you must be honest."
Dominick
In R, .Call() does not copy its arguments but the C code writer is expected to do so if they will be altered. In S+ (and S), .Call() copies the arguments if altering them would make a user-visible change in the environment, unless you specify that the C code will not be altering them. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
R has pass-by-value(!) semantics, so semantically you code has nothing to do with the result.1 and result.2 variables since only their *values* are guaranteed to be passed (possibly a copy).
Clearly C code called from .Call must be allowed to construct R objects, as that's how much of R itself is implemented, and further down, it's what you recommend I should do instead. But why does it follow that C code must never modify an object initially allocated by R code? ?Are you saying there is some special magic difference in the state of an object allocated by R's C code vs. one allocated by R code? ?If so, what is it? What is the potential problem here, that the garbage collector will suddenly run while my C code is in the middle of writing to an R list? Yes, if the gc is going to move the object elsewhere, that would be very bad. ?But it looks to me like that cannot happen, because lots of the R implementation itself would fail badly if it did. E.g.: ?The PROTECT call is used to increment reference counts, but I see no guarantees that it is atomic with the operations that allocate objects. ?I see no mutexes or other barriers in C code to prevent the gc from running, thus implying that it *can't* run until the C function completes. And R is single threaded, of course. ?But what about signal handlers, could they ever invoke R's gc? Also, I was initially surprised not to find any matrix C APIs, but grepping for examples (sorry, I don't remember exactly which functions) showed me that the apparently accepted way to do matrix operations from C is to simply assume R's column-first dense matrix order, and access the 2D matrix as a flat 1D vector. ?(Which is easy.)
The fact that internally R attempts to avoid copying for performance reasons is the only reason why your code may have appeared to work, but it's invalid!
I will probably change my code to allocate a new list from the C code and return that, as you recommend. ?My main reason for doing the allocation in R was just that it was simpler, especially given the very limited documentation of R's C API. But, I didn't see anything in the "Writing R Extensions" doc saying that what my code is doing is "invalid", and more importantly, I don't see why it would or should be invalid... I'd still like to better understand why you think doing the initial allocation of an object in R rather than C code is such a problem. ?So far, I don't see any way that the R interpreter could ever tell the difference. Wait, or is the only objection here that I'm using C in a way that makes pass-by-reference semantics visible to my R code? ?Which will work completely correctly, but is not the The Proper R Way? I don't actually need pass-by-reference behavior here at all, but I can imagine cases where I might want it, so I'd like to understand your objections better. ?Is using C to implement pass-by-reference actually Broken, or merely Ugly? ?From my reasons above, I think it will always work correctly and thus is not Broken. ?But of course given R's devotion to pass-by-value, it could be considered unacceptably Ugly. -- Andrew Piskorski <atp at piskorski.com> http://www.piskorski.com/
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel