Changing arguments inside .Call. Wise to encourage "const" on all arguments?
On Dec 10, 2012, at 2:05 PM, Simon Urbanek wrote:
On Dec 10, 2012, at 1:51 AM, Paul Johnson wrote:
I'm continuing my work on finding speedups in generalized inverse calculations in some simulations. It leads me back to .C and .Call, and some questions I've never been able to answer for myself. It may be I can push some calculations to LAPACK in or C BLAS, that's why I realized again I don't understand the call by reference or value semantics of .Call Why aren't users of .Call encouraged to "const" their arguments, and why doesn't .Call do this for them (if we really believe in return by value)?
Because there is a difference between the *data* part of the SEXP and the object itself. Internal structure of the object may need to be modified (e.g. the NAMED ref counting increased when you assign it) in a call to R API. You can't flag the data part as const separately, so you have to use non-const SEXP.
R Gentleman's R Programming for Bioinformatics is the most understandable treatment I've found on .Call. It appears to me .Call leaves "wiggle room" where there should be none. Here's Gentleman on p. 185. "For .Call and .External, the return value is an R object (the C functions must return a SEXP), and for these functions the values that were passed are typically not modified. If they must be modified, then making a copy in R, prior to invoking the C code, is necessary." I *think* that means: .Call allows return by reference, BUT we really wish users would not use it. Users can damage R data structures that are pointed to unless they really truly know what they are doing on the C side. ?? This seems dangerous. Why allow return by reference at all?
Because it is completely legal to do things like
SEXP last(SEXP bar) {
if (TYPEOF(bar) = VECSXP && LENGTH(bar) > 0)
return VECTOR_ELT(bar, LENGTH(bar) - 1);
Rf_error("sorry, I only work on lists");
}
Martin Morgan pointed out that this example is a bad one -- which is true. The common idiom that is safe is
SEXP foo(SEXP bar) {
...
return bar;
}
However, the last() example above is bad, because returning the element directly is a bad idea -- the conservative approach would be to use duplicate(), the more efficient one would be to bump up NAMED. Sorry, my bad. I guess I was rather strengthening Paul's point to duplicate() when in doubt even if it's less efficient :).
Cheers,
Simon
There is no point in duplicating the element.
On p. 197, there's a similar comment "Any function that has been invoked by either .External or .Call will have all of its arguments protected already. You do not need to protect them. .... [T]hey were not duplicated and should be treated as read-only values." "should be ... read-only" concerns me. They are "protected" in the garbage collector sense,
Yes
but they are not protected from "return by reference" damage. Right?
There is no "return by reference damage". The only problem is if you modify input arguments while someone else holds a reference, but there is no way in C to prevent that while still allowing them to be useful. Note that it is legal to modify input arguments if there are no references to it. Cheers, Simon
Why doesn't the documentation recommend function writers to mark
arguments to C functions as const? Isn't that what the return by
value policy would suggest?
Here's a troublesome example in R src/main/array.c:
/* DropDims strips away redundant dimensioning information. */
/* If there is an appropriate dimnames attribute the correct */
/* element is extracted and attached to the vector as a names */
/* attribute. Note that this function mutates x. */
/* Duplication should occur before this is called. */
SEXP DropDims(SEXP x)
{
SEXP dims, dimnames, newnames = R_NilValue;
int i, n, ndims;
PROTECT(x);
dims = getAttrib(x, R_DimSymbol);
[... SNIP ....]
setAttrib(x, R_DimNamesSymbol, R_NilValue);
setAttrib(x, R_DimSymbol, R_NilValue);
setAttrib(x, R_NamesSymbol, newnames);
[... SNIP ....]
return x;
}
Well, at least there's a warning comment with that one. But it does
show .Call allows call by reference.
Why allow it, though? DropDims could copy x, modify the copy, and return it.
I wondered why DropDims bothers to return x at all. We seem to be
using modify and return by reference there.
I also wondered why x is PROTECTED, actually. Its an argument, wasn't
it automatically protected? Is it no longer protected after the
function starts modifying it?
Here's an example with similar usage in Writing R Extensions, section
5.10.1 "Calling .Call". It protects the arguments a and b (needed
??), then changes them.
#include <R.h>
#include <Rdefines.h>
SEXP convolve2(SEXP a, SEXP b)
{
R_len_t i, j, na, nb, nab;
double *xa, *xb, *xab;
SEXP ab;
PROTECT(a = AS_NUMERIC(a)); /* PJ wonders, doesn't this alter
"a" in calling code*/
PROTECT(b = AS_NUMERIC(b));
na = LENGTH(a); nb = LENGTH(b); nab = na + nb - 1;
PROTECT(ab = NEW_NUMERIC(nab));
xa = NUMERIC_POINTER(a); xb = NUMERIC_POINTER(b);
xab = NUMERIC_POINTER(ab);
for(i = 0; i < nab; i++) xab[i] = 0.0;
for(i = 0; i < na; i++)
for(j = 0; j < nb; j++) xab[i + j] += xa[i] * xb[j];
UNPROTECT(3);
return(ab);
}
Doesn't
PROTECT(a = AS_NUMERIC(a));
have the alter the data structure "a" both inside the C function and
in the calling R code as well? And, if a was PROTECTED automatically,
could we do without that PROTECT()?
pj
--
Paul E. Johnson
Professor, Political Science Assoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org http://quant.ku.edu
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel