Changing arguments inside .Call. Wise to encourage "const" on all arguments?

On Dec 10, 2012, at 1:51 AM, Paul Johnson wrote:

I'm continuing my work on finding speedups in generalized inverse
calculations in some simulations.  It leads me back to .C and .Call,
and some questions I've never been able to answer for myself.  It may
be I can push some calculations to LAPACK in or C BLAS, that's why I
realized again I don't understand the call by reference or value
semantics of .Call

Why aren't users of .Call encouraged to "const" their arguments, and
why doesn't .Call do this for them (if we really believe in return by
value)?

Because there is a difference between the *data* part of the SEXP and the object itself. Internal structure of the object may need to be modified (e.g. the NAMED ref counting increased when you assign it) in a call to R API. You can't flag the data part as const separately, so you have to use non-const SEXP.

R Gentleman's R Programming for Bioinformatics is the most
understandable treatment I've found on .Call. It appears to me .Call
leaves "wiggle room" where there should be none.  Here's Gentleman on
p. 185. "For .Call and .External, the return value is an R object (the
C functions must return a SEXP), and for these functions the values
that were passed are typically not modified.  If they must be
modified, then making a copy in R, prior to invoking the C code, is
necessary."

I *think* that means:

.Call allows return by reference, BUT we really wish users would not
use it. Users can damage R data structures that are pointed to unless
they really truly know what they are doing on the C side. ??

This seems dangerous. Why allow return  by reference at all?

Because it is completely legal to do things like

SEXP last(SEXP bar) {
  if (TYPEOF(bar) = VECSXP && LENGTH(bar) > 0)
    return VECTOR_ELT(bar, LENGTH(bar) - 1);
 Rf_error("sorry, I only work on lists");
}

Martin Morgan pointed out that this example is a bad one -- which is true. The common idiom that is safe is

SEXP foo(SEXP bar) {
...
return bar;
}

However, the last() example above is bad, because returning the element directly is a bad idea -- the conservative approach would be to use duplicate(), the more efficient one would be to bump up NAMED. Sorry, my bad. I guess I was rather strengthening Paul's point to duplicate() when in doubt even if it's less efficient :).

Cheers,
Simon
There is no point in duplicating the element.

On p. 197, there's a similar comment  "Any function that has been
invoked by either .External or .Call will have all of its arguments
protected already. You do not need to protect them. .... [T]hey were
not duplicated and should be treated as read-only values."

"should be ... read-only" concerns me. They are "protected" in the
garbage collector sense,
Yes

but they are not protected from "return by
reference" damage. Right?

There is no "return by reference damage".

The only problem is if you modify input arguments while someone else holds a reference, but there is no way in C to prevent that while still allowing them to be useful. Note that it is legal to modify input arguments if there are no references to it.

Cheers,
Simon

Why doesn't the documentation recommend function writers to mark
arguments to C functions as const?  Isn't that what the return by
value policy would suggest?

Here's a troublesome example in  R src/main/array.c:

/* DropDims strips away redundant dimensioning information. */
/* If there is an appropriate dimnames attribute the correct */
/* element is extracted and attached to the vector as a names */
/* attribute.  Note that this function mutates x. */
/* Duplication should occur before this is called. */

SEXP DropDims(SEXP x)
{
  SEXP dims, dimnames, newnames = R_NilValue;
  int i, n, ndims;

 PROTECT(x);
 dims = getAttrib(x, R_DimSymbol);
[... SNIP ....]
  setAttrib(x, R_DimNamesSymbol, R_NilValue);
  setAttrib(x, R_DimSymbol, R_NilValue);
  setAttrib(x, R_NamesSymbol, newnames);
[... SNIP ....]

return x;
}

Well, at least there's a warning comment with that one.  But it does
show .Call allows call by reference.

Why allow it, though? DropDims could copy x, modify the copy, and return it.

I wondered why DropDims bothers to return x at all. We seem to be
using modify and return by reference there.

I also wondered why x is PROTECTED, actually. Its an argument, wasn't
it automatically protected? Is it no longer  protected after the
function starts modifying it?

Here's an  example with similar usage in Writing R Extensions, section
5.10.1 "Calling .Call".  It protects the arguments a and b (needed
??), then changes them.

#include <R.h>
#include <Rdefines.h>

   SEXP convolve2(SEXP a, SEXP b)
   {
       R_len_t i, j, na, nb, nab;
       double *xa, *xb, *xab;
       SEXP ab;

       PROTECT(a = AS_NUMERIC(a)); /* PJ wonders, doesn't this alter
"a"  in calling code*/
       PROTECT(b = AS_NUMERIC(b));
       na = LENGTH(a); nb = LENGTH(b); nab = na + nb - 1;
       PROTECT(ab = NEW_NUMERIC(nab));
       xa = NUMERIC_POINTER(a); xb = NUMERIC_POINTER(b);
       xab = NUMERIC_POINTER(ab);
       for(i = 0; i < nab; i++) xab[i] = 0.0;
       for(i = 0; i < na; i++)
            for(j = 0; j < nb; j++) xab[i + j] += xa[i] * xb[j];
       UNPROTECT(3);
       return(ab);
   }

Doesn't

      PROTECT(a = AS_NUMERIC(a));

have the alter the data structure "a" both inside the C function and
in the calling R code as well? And, if a was PROTECTED automatically,
could we do without that PROTECT()?

pj

-- 
Paul E. Johnson
Professor, Political Science      Assoc. Director
1541 Lilac Lane, Room 504      Center for Research Methods
University of Kansas                 University of Kansas
http://pj.freefaculty.org               http://quant.ku.edu

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Changing arguments inside .Call. Wise to encourage "const" on all arguments?

Thread (3 messages)