Skip to content

R-devel Digest, Vol 83, Issue 2

21 messages · Laurent Gautier, Luke Tierney, Duncan Murdoch +2 more

#
[Disclaimer: what is below reflects my understanding from reading the R 
source, others will correct where deemed necessary]
On 1/2/10 12:00 PM, r-devel-request at r-project.org wrote:
The most precise technical documentation is in memory.c
PROTECT is an alias for Rf_protect, itself an alias for
SEXP protect(SEXP s);
and uses a stack (R_PPStack) to store protected objects.
This depends on the requirements for your system.

For example, in rpy2 I added a reference counting layer(*) because I 
wanted to allow several Python objects to share the same underlying R 
object, but that's not currently(*) counting how many times an object 
should be freed.
(*: imperfect, but currently doing a very decent job - details upon 
request).

That kind of feature could be provided by R's C-level API, since this 
could be seen of general use as well as give an opportunity to improve 
the performances of the R_PreservedObject/R_ReleaseObject duo whenever a 
lot of objects are protected and/or external code is 
protecting/releasing objects through a FIFO proxy.
I understand the code in memory.c like an object preserved twice needs 
to be freed twice: R_PreciousList is just a (linked) list, and 
"R_PreserveObject(object)" adds the object to the beginning of the list 
while "R_ReleaseObject(object)" removes the first "object" found from 
the list.
PROTECT/UNPROTECT is trading granularity for speed. It is a stack with 
only tow operations possible:
- push 1 object into the stack
- pull (unprotect) N last objects from the stack


HTH,


L.
#
Thanks.
On 01/02/2010 05:36 PM, Laurent Gautier wrote:
We wrap up SEXP into a C++ class that in particular manages itself 
preserving and releasing the object to garbage collection.
That's what I understand of it too. It fits nicely into C++ assignment 
operator and copy constructor.

  
    
#
On 02/01/2010 11:36 AM, Laurent Gautier wrote:
UNPROTECT_PTR is also possible, which does a linear search through the 
stack and unprotects something possibly deep within it.  There is also 
REPROTECT which allows you to replace an entry within the stack.

I would guess that UNPROTECT_PTR is more efficient than RecursiveRelease 
because it doesn't use so much stack space when it needs to go deep into 
the stack to release, but it is possible the compiler recognizes the 
tail recursion and RecursiveRelease is implemented efficiently.  In that 
case it could be more efficient than UNPROTECT_PTR, which has to move 
all the other entries down to fill the newly vacated space.  Really the 
only reliable way to answer efficiency questions like this is to try
both ways and see which works better in your application.

Another possibility is to maintain your own list or environment of 
objects, and just protect/preserve the list as a whole.

Duncan Murdoch
#
On 1/2/10 5:56 PM, Duncan Murdoch wrote:
(...)
>
Thanks. I did not know about UNPROTECT_PTR.
I had concerns over the stack usage, but so far it did not prove too 
much of a problem. Still, why isn't the tail recursion explicitly made 
an iteration then ? This would take the "may be the compiler figures it 
out, may be not" variable out of the equation.
Interesting idea, this would let one perform his/her own bookkeeping on 
the list/environment. How is R garbage collection checking contained 
objects ? (I am thinking performances here, and may be cyclic references).



L.
#
On 01/02/2010 05:56 PM, Duncan Murdoch wrote:
Thanks; I've used those when I played with the parser's code (for 
highlight). It felt slightly harder to use than preserve/release since 
you have to maintain the protect index, etc ...

I think I'll go as you say below, just maintain my own precious list, 
the way Preserve/Release does it.

Romain

  
    
#
On 1/2/10 5:50 PM, Romain Francois wrote:
(...)
If you do not allow several C++ instances to share the same SEXP, you 
should not need reference counting. In the case of rpy2, this was made 
necessary for allowing inheritance (as nested calls to constructors can 
cause a transient sharing of the given SEXP across several Python objects).


(...)



L.
#
On Jan 2, 2010, at 12:17 PM, Laurent Gautier wrote:

            
Careful - the protection stack (bookkeeping by R) has nothing to do with the C function call stack hence it has nothing to do with the compiler. The protection stack is global so usually you don't run out of it unless something goes horribly wrong (=infinite loop).
You don't really want to care because the GC is global for all objects (and cycles are supported by the GC in R) - so whether you keep it yourself or Preserve/Release is practically irrelevant (the protection stack is handled separately).

As for keeping your own list -- if you really care about performance that much (to be honest in vast majority of cases it doesn't matter) you have to be careful how you implement it. Technically the fastest way is preallocated generic vector but it really depends on how you deal with the access so you can easily perform worse than Preserve/Release if you're not careful.

As a side note - the best way (IMHO) to deal with all those issues is to use external pointers because a) you get very efficient C finalizers b) you can directly (and very efficiently) tie lifespan of other objects to the same SEXP and c) as guardians they can nicely track other objects that hold them.

Cheers,
Simon
#
Simon Urbanek wrote:
I think Laurent was referring to RecursiveRelease, which could use a lot 
of C stack if you choose to release something that is deep in the list, 
since it checks the head, and if that doesn't match, calls itself again 
on the rest of the list.  (I checked, and at least one version of gcc 
doesn't recognize the tail recursion:  it really does generate a 
recursive call.)

Laurent asked why it isn't optimized to avoid the recursion:  I think 
the answer is simply because it is so rarely used that nobody has bothered.

Duncan Murdoch
#
On Sat, 2 Jan 2010, Duncan Murdoch wrote:

            
I believe current gcc -O2 does optimize tail recursive calls (more
generally sibling calls) but RecursiveRelease isn't written as a tail
recursion -- the return vaule is used in an assignment.  In any case
it would probably make more sense to rewrite RecursiveRelease to not
use recursion at all, but I'm not motivated to do that at this point.

luke

  
    
#
On 1/2/10 8:53 PM, Duncan Murdoch wrote:
Yes, I was referring to RecursiveRelease. Sorry if this was not clear.

What are the chances for a patch to be accepted ? At first sight(*), 
making that tail recursion an iterative function is not a major 
undertaking, and reviewing the patch be fairly straightforward... but I 
can always use that time otherwise if the answer to the question is "nil".


L.
#
On 1/2/10 8:28 PM, Simon Urbanek wrote:
(...)
I guess that I'll have to know in order to understand that I don't 
really want to care. ;-)
The garbage collector must somehow know if an object is available for 
collection (and will have to check whether an object is PROTECTed or 
not, or Preserved or not).
I suppose that upon being called the garbage collector will first look 
into the PROTECTed and Preserved objects, mark them as unavailble for 
collection, then recursively mark objects contained in them.
Releasing being of linear complexity, having few thousands of Preserved 
objects not being anticipated as an extraordinary situation, and 
Preserve/Release cycles being quite frequent, I start minding a bit 
about the performance. Keeping my own list would let me experiment with 
various strategies (and eventually offer
Thanks. I am not certain to follow everything. Are you suggesting that 
rather than Preserve-ing/Release-ing a list/environment that would act 
as a guardian for several objects, one should use an external pointer 
(to an arbitrary C pointer) ? In that case, how does one indicate that 
an external pointer acts as a container ?

Or are you suggesting that rather than Preserve-in/Release-ing R objects 
one should use an external pointer acting as a proxy for a SEXP 
(argument "prot" in R_MakeExternalPtr(void *p, SEXP tag, SEXP prot) ) ?
(but in that case the external pointer will itself have to go through 
Preserve/Release cycles...)


Cheers,


Laurent
-- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30
#
On 02/01/2010 3:16 PM, Laurent Gautier wrote:
I don't think I would want to review such a patch (I don't know the 
memory manager well, I don't know that there is really a case where it 
matters enough to be worth doing), so I'd say if you don't get a message 
from a core member volunteering to do so, you should assume it won't be 
accepted.  But that doesn't mean you shouldn't write the code for your 
own internal use and edification, and if you can put together a demo 
that shows it really makes a big difference in a realistic situation, 
you might get a different response.

Duncan Murdoch
#
On 01/02/2010 11:12 PM, Duncan Murdoch wrote:
From what I understand, this has little to do with the memory manager, 
and resumes to the simple problem "how to remove an object from a list".

Something like this, using the amazing inline and inspect packages:

require( inline )
require( inspect )

remover <- cfunction(signature( list = "language", object = 
"environment" ), '
	if( !isNull( list ) ){
		SEXP x = list ;
		SEXP y ;
		while( CAR(x) != object && CADR(x) != R_NilValue ){
			y = x ;
			x = CDR(x) ;
		}
		if( CAR(x) == object ) SETCDR(y, CDR(x) ) ;
	}
	return list ;
', Rcpp=FALSE, verbose=FALSE )

e <- new.env()
call <- call( "foo", e, e, 1:10, 3 )
call
# inspect( call )
result <- remover( call ,e )
result
# inspect( result )

gives this :

foo(10, <environment>, 0, <environment>, 1) 

@0x9f4e0d0 06 LANGSXP [NAM(2)] 

   @0x9f4e204 01 SYMSXP [] "foo" 

   @0xa907b78 14 REALSXP [NAM(2)] (len=1, tl=1763713056) 

   @0x9f4d564 04 ENVSXP [NAM(1)] 

   FRAME: 

     @0x9e94a10 00 NILSXP [MARK,NAM(2)] 

   ENCLOS:
     @0x9eb7b4c 04 ENVSXP [MARK,NAM(2),GP(0x8000)]
   HASHTAB:
     @0x9e94a10 00 NILSXP [MARK,NAM(2)]
   @0xa907b58 14 REALSXP [NAM(2)] (len=1, tl=1936941344)
   @0x9f4d564 04 ENVSXP [NAM(1)]
   FRAME:
     @0x9e94a10 00 NILSXP [MARK,NAM(2)]
   ENCLOS:
     @0x9eb7b4c 04 ENVSXP [MARK,NAM(2),GP(0x8000)]
   HASHTAB:
     @0x9e94a10 00 NILSXP [MARK,NAM(2)]
   @0xa907b38 14 REALSXP [NAM(2)] (len=1, tl=543908709)
NULL
==================
foo(10, 0, <environment>, 1)
@0x9f4e0d0 06 LANGSXP [NAM(2)]
   @0x9f4e204 01 SYMSXP [] "foo"
   @0xa907b78 14 REALSXP [NAM(2)] (len=1, tl=1763713056)
   @0xa907b58 14 REALSXP [NAM(2)] (len=1, tl=1936941344)
   @0x9f4d564 04 ENVSXP [NAM(2)]
   FRAME:
     @0x9e94a10 00 NILSXP [MARK,NAM(2)]
   ENCLOS:
     @0x9eb7b4c 04 ENVSXP [MARK,NAM(2),GP(0x8000)]
   HASHTAB:
     @0x9e94a10 00 NILSXP [MARK,NAM(2)]
   @0xa907b38 14 REALSXP [NAM(2)] (len=1, tl=543908709)
NULL



so it boils down to this as a replacement to RecursiveRelease :

/**
  * Removes the first instance of object with the list
  */
static SEXP RemoveFromList( SEXP object, SEXP list){
	if( !isNull( list ) ){
		SEXP x = list ;
		SEXP y ;
		while( CAR(x) != object && CADR(x) != R_NilValue ){
			y = x ;
			x = CDR(x) ;
		}
		if( CAR(x) == object ) SETCDR(y, CDR(x) ) ;
	}
	return list ;
}


Romain
#
On 01/02/2010 11:41 PM, Romain Francois wrote:
Going a bit further, attached is an R script that contains c code for 
both methods, generation of some "big" list and calling each method to 
remove the object in the worst case scenario (object to remove is at the 
end of the list) :

[romain at santorini tmp]$ Rscript release.R 10000
gcc -std=gnu99 -I/usr/local/lib/R/include  -I/usr/local/include    -fpic 
  -g -O2 -c release.c -o release.o
gcc -std=gnu99 -shared -L/usr/local/lib -o release.so release.o 
-L/usr/local/lib/R/lib -lR
gcc -std=gnu99 -I/usr/local/lib/R/include  -I/usr/local/include    -fpic 
  -g -O2 -c remove.c -o remove.o
gcc -std=gnu99 -shared -L/usr/local/lib -o remove.so remove.o 
-L/usr/local/lib/R/lib -lR
    user  system elapsed
       0       0       0
    user  system elapsed
   0.000   0.001   0.001

Then it starts to be "noticeable" :

[romain at santorini tmp]$ Rscript release.R 100000
gcc -std=gnu99 -I/usr/local/lib/R/include  -I/usr/local/include    -fpic 
  -g -O2 -c release.c -o release.o
gcc -std=gnu99 -shared -L/usr/local/lib -o release.so release.o 
-L/usr/local/lib/R/lib -lR
gcc -std=gnu99 -I/usr/local/lib/R/include  -I/usr/local/include    -fpic 
  -g -O2 -c remove.c -o remove.o
gcc -std=gnu99 -shared -L/usr/local/lib -o remove.so remove.o 
-L/usr/local/lib/R/lib -lR
    user  system elapsed
   0.001   0.000   0.002
    user  system elapsed
   0.007   0.004   0.013

then :

[romain at santorini tmp]$ Rscript release.R 500000
gcc -std=gnu99 -I/usr/local/lib/R/include  -I/usr/local/include    -fpic 
  -g -O2 -c release.c -o release.o
gcc -std=gnu99 -shared -L/usr/local/lib -o release.so release.o 
-L/usr/local/lib/R/lib -lR
gcc -std=gnu99 -I/usr/local/lib/R/include  -I/usr/local/include    -fpic 
  -g -O2 -c remove.c -o remove.o
gcc -std=gnu99 -shared -L/usr/local/lib -o remove.so remove.o 
-L/usr/local/lib/R/lib -lR
    user  system elapsed
   0.008   0.000   0.008
Error: segfault from C stack overflow
Timing stopped at: 0.006 0.013 0.021
Execution halted

Romain
#
On 1/2/10 11:41 PM, Romain Francois wrote:
(...)
I would generalize even further, up to: "how to make a tail recursion 
into an interation" (and that often boils down to writing a for loop - 
little edification to get from it, I suppose... one probably wants to 
write it to see it included).

I see below that you are living up to the enthusiasm claimed, and had a 
shot at it (and even did benchmark to demonstrate the expected benefit - 
impressive).

Let's see what happens with it...


L.
#
On 02/01/2010 6:35 PM, Laurent Gautier wrote:
Just one followup:  as Luke mentioned, I was wrong, it isn't really a 
tail recursion (which is why gcc didn't optimize it away).

Duncan Murdoch
#
On Jan 2, 2010, at 4:08 PM, Laurent Gautier wrote:

            
The GC marks recursively from all known roots of which Preserved list is one of many and all elements of the protection stack are treated as such as well (FWIW the Preserved and protected list are in that order the last two). Since this involves (by definition) all live objects it doesn't matter to which other object you assign the node. The only detail is that protection stack does not change the generation (since there is no real node to assign to).
Sure - what I meant is that you have to optimize for one thing or the other so you have to be careful what you do.
I was guessing that you use this in conjunction with some C++ magic not just plain R objects and thus you have to deal with two life spans. From the other messages I think you are dealing with the simple situation of wrapping an R object as reference in the other system with explicit memory management (i.e. in C++ you have explicit new/delete life cycle) in which case you really don't need anything more than Preserve. It is more interesting when you want to track the life of R objects without imposing the life span - i.e when you want to know when an object in R is collected such that you can delete it from the other system (i.e. you don't explicitly retain it by the reference).

Cheers,
Simon
#
On Jan 2, 2010, at 5:41 PM, Romain Francois wrote:

            
FWIW inspect is now part of R itself but as an internal function so you can either use it directly via .Internal or for compatibility
inspect <- function(...) .Internal(inspect(...))
the arguments are (x, max.depth=-1, max.elements=5) [the last one is only supported in R-devel].

Cheers,
Simon
#
On Jan 2, 2010, at 5:41 PM, Romain Francois wrote:

            
That blows up if the object is the first in the list BTW. You probably meant more something like

static SEXP ReleaseObj(SEXP object, SEXP list) {
	if (!isNull(list)) {
		if (CAR(list) != object) {
			SEXP c = list;
			while (!isNull(CDR(c))) {
				if (CADR(c) == object) {
					CDR(c) = CDDR(c);
					break;
				}
				c = CDR(c);
			}
		} else return CDR(list);
	}
	return list;
}
		
(For non-core replace CDR(c) with SETCDR(c) although that goes through the write barrier).

Cheers,
Simon
#
On 01/03/2010 02:04 AM, Simon Urbanek wrote:
Thanks. And sorry for not testing this case. (oops)

Romain
#
On 01/03/2010 01:34 AM, Simon Urbanek wrote:
Many thanks for this clarification. Rcpp is using a jri wanabee 
approach. Essentially we have :

class RObject{
public:
RObject(SEXP x){ preserve x }
~RObject(){ release x}
private:
SEXP x ;
}

For the story, I'm also doing the other way (rJava wanabee) in the CPP 
package 
(http://romainfrancois.blog.free.fr/index.php?post/2009/12/22/CPP-package-%3A-exposing-C-objects) 
that wraps up arbitrary C++ objects (currently stl containers) as 
external pointers. You might recognize some patterns here.

 > # create the object
 > x <- new( CPP("set<int>") )
 > # insert data using the insert method
 > x$insert( sample( 1:20, size = 50, replace = TRUE ) )
 > # ask for the size of the set
 > x$size()
[1] 17
 > # bring it back as an R classic integer vector
 > # see how it comes back sorted and unique (which is set's job)
 > as.vector(x)
  [1]  1  3  4  6  7  8  9 10 11 12 13 14 15 17 18 19 20


no magic however since C++ reflection capabilities are very limited, 
well some magic based on the brew package.

Romain