[patch] Behavior of .C() and .Fortran() when given double(0) or integer(0).

Dear R-devel,

While tracking down some hard-to-reproduce bugs in a package I maintain,
I stumbled on a behavior change between R 2.15.0 and the current R-devel
(or SVN trunk).

In 2.15.0 and earlier, if you passed an 0-length vector of the right
mode (e.g., double(0) or integer(0)) as one of the arguments in a .C()
call with DUP=TRUE (the default), the C routine would be passed NULL
(the C pointer, not R NULL) in the corresponding argument. The current
development version instead passes it a pointer to what appears to be
memory location immediately following the the SEXP that holds the
metadata for the argument. If the argument has length 0, this is often
memory belonging to a different R object. (DUP=FALSE in 2.15.0
appears to have the same behavior as R-devel.)

.C() documentation and Writing R Extensions don't explicitly specify a
behavior for 0-length vectors, so I don't know if this change is
intentional, or whether it was a side-effect of the following news item:

      .C() and .Fortran() do less copying: arguments which are raw,
      logical, integer, real or complex vectors and are unnamed are not
      copied before the call, and (named or not) are not copied after
      the call.  Lists are no longer copied (they are supposed to be
      used read-only in the C code).

Was the change in the empty vector behavior intentional?

It seems to me that standardizing on the behavior of giving the C
routine NULL is safer, more consistent with other memory-related
routines, and more convenient: whereas dereferencing a NULL pointer is
an immediate (and therefore easily traced) segfault, dereferencing an
invalid pointer that is nevertheless in the general memory area
allocated to the program often causes subtle errors down the line;
R_alloc asked to allocate 0 bytes returns NULL, at least on my platform;
and the C routine can easily check if a pointer is NULL, but with the
R-devel behavior, the programmer has to add an explicit way of telling
that an empty vector was passed.

I've attached a small test case (dotC_NULL.* files) that shows the
difference. The C file should be built with R CMD SHLIB, and the R file
calls the functions in the library with a variety of arguments. Output I
get from running
R CMD BATCH --no-timing --vanilla --slave dotC_NULL.R
on R 2.15.0, R trunk, and R trunk with my patch (described below) are attached.

The attached patch (dotC_NULL.patch) against the current trunk
(affecting src/main/dotcode.c) restores the old behavior for DUP=TRUE
(i.e., 0-length vector -> NULL pointer) and extends it to the DUP=FALSE
case. It does so by checking if an argument --- if it's of mode raw,
integer, real, or complex --- to a .C() or .Fortran() call has length 0,
and, if so, sets the pointer to be passed to NULL and then skips the
copying of the C routine's changes back to the R object for that
argument. The additional computing cost should be negligible (i.e.,
checking if vector length equals 0 and break-ing out of a switch
statement if so).

The patch appears to work, at least for my package, and R CMD check
passes for all recommended packages (on my 64-bit Linux system), but
this is my first time working with R's internals, so handle with care.

                                   Best,
                                   Pavel Krivitsky

-------------- next part --------------
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

R_alloc asked to allocate 1 byte: 
Pointer to output from R_alloc() of 1 bytes: 0x211c470.
Return value: [1] 1

R_alloc asked to allocate 0 bytes: 
Pointer to output from R_alloc() of 0 bytes: (nil).
Return value: [1] 0

Integer vector with 1 element: 
Pointer to arg: 0x2123b00.
Return value: [1] 0

Integer vector with 0 elements: 
Pointer to arg: (nil).
Return value: integer(0)

Integer vector with 1 element and DUP=FALSE: 
Pointer to arg: 0x2132940.
Return value: [1] 0

Integer vector with 0 elements and DUP=FALSE: 
Pointer to arg: 0x2134a80.
Return value: integer(0)

-------------- next part --------------
R Under development (unstable) (2012-05-04 r59314)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

R_alloc asked to allocate 1 byte: 
Pointer to output from R_alloc() of 1 bytes: 0x1e56270.
Return value: [1] 1

R_alloc asked to allocate 0 bytes: 
Pointer to output from R_alloc() of 0 bytes: (nil).
Return value: [1] 0

Integer vector with 1 element: 
Pointer to arg: 0x1e60db0.
Return value: [1] 0

Integer vector with 0 elements: 
Pointer to arg: 0x1e75188.
Return value: integer(0)

Integer vector with 1 element and DUP=FALSE: 
Pointer to arg: 0x1e6ad90.
Return value: [1] 0

Integer vector with 0 elements and DUP=FALSE: 
Pointer to arg: 0x1e7dc10.
Return value: integer(0)

-------------- next part --------------
R Under development (unstable) (2012-05-04 r59314)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

R_alloc asked to allocate 1 byte: 
Pointer to output from R_alloc() of 1 bytes: 0x27495c0.
Return value: [1] 1

R_alloc asked to allocate 0 bytes: 
Pointer to output from R_alloc() of 0 bytes: (nil).
Return value: [1] 0

Integer vector with 1 element: 
Pointer to arg: 0x2754100.
Return value: [1] 0

Integer vector with 0 elements: 
Pointer to arg: (nil).
Return value: integer(0)

Integer vector with 1 element and DUP=FALSE: 
Pointer to arg: 0x275e0e0.
Return value: [1] 0

Integer vector with 0 elements and DUP=FALSE: 
Pointer to arg: (nil).
Return value: integer(0)

-------------- next part --------------
sessionInfo()
cat("\n")
dyn.load("dotC_NULL.so")
run_test<-function(desc,Cfun,args){
  cat(desc,"\n")
  out <- do.call(".C",c(list(Cfun),args))
  cat("Return value: ")
  print(out[[1]])
  cat("\n")
}

run_test("R_alloc asked to allocate 1 byte:", "R_alloc_test",list(nbytes=as.integer(1)))
run_test("R_alloc asked to allocate 0 bytes:", "R_alloc_test",list(nbytes=as.integer(0)))

run_test("Integer vector with 1 element:", "dotC_NULL",list(arg=integer(1)))
run_test("Integer vector with 0 elements:", "dotC_NULL",list(arg=integer(0)))
run_test("Integer vector with 1 element and DUP=FALSE:", "dotC_NULL",list(arg=integer(1), DUP=FALSE))
run_test("Integer vector with 0 elements and DUP=FALSE:", "dotC_NULL",list(arg=integer(0), DUP=FALSE))
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dotC_NULL.patch
Type: text/x-patch
Size: 2473 bytes
Desc: 
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20120504/ab68095c/attachment.bin>