protect/unprotect howto in C code - R-devel

Wed, May 17, 2006 3:09 AM #

Hi,

Im currently trying to debug a 'error in unprotect: stack imbalance' problem 
and I am curious about two basic questions on the use of PROTECT and 
UNPROTECT, which I could not figure out:

- which objects have to be protected, namely, if the code is something like:

SEXP fun, e;
/* get the expression e ... */
fun = eval(e, R_GlobalEnv);
/* or like this?: PROTECT(fun = eval(e, R_GlobalEnv)); */
PROTECT(fun = VECTOR_ELT(fun, 1));
/* do more things with fun ... */

does one need to protect the result of a call to 'eval' immediately? And how 
about R_tryEval?
While searching for code examples in the sources, I found both protected evals 
and fewer non-protected.

- Can someone give a hint (or some documents) on a way to simplify debugging 
such problem in addition to using gdb, please? I thought about temporarily 
defining macros such as 
#define DEBUG_Protect(x)  PROTECT(x); fprintf(stderr, "Protecting in %s, l: 
%d\n", __FILE__, __LINE__)
#define UNDEBUG_Protect(x) fprintf(stderr, "Unprotecting %d  in %s, l:, %d  
\n", x , __FILE__, __LINE__); UNPROTECT(x);
and then replace all calls temporarily in the package source. But there must 
be a better way... 

Thank you very much
(and my appologies, if this sounds odd to more experineced c programmers ;) )
Michael

Thomas Lumley

Wed, May 17, 2006 7:55 AM #

On Wed, 17 May 2006, Michael Dondrup wrote:

The first rule is that any newly created R object needs to be protected 
before the garbage collector runs, and unprotected before exiting the 
function and after the last time the garbage collector runs.

The second rule is that protection applies to the contents of a variable 
(the R object) not to the variable.

The second rule is that protecting an object protects all its elements.

In the example above
     fun = eval(e, R_GlobalEnv);
may create a new object (it might just return a pointer to an existing 
function) and so probably needs to be protected.

On the other hand
  fun = VECTOR_ELT(fun, 1);
does not then need protecting. Since fun is protected, its second element 
is also protected.

So
    PROTECT(fun = eval(e, R_GlobalEnv));
    fun = VECTOR_ELT(fun, 1);
    /* do more stuff with fun */
    UNPROTECT(1);

If you don't know exactly which functions might return a new object or 
trigger the garbage collector it is probably safe to assume that anything 
might [this is the advice in 'Writing R Extensiosn'].  Unless you are 
getting close to the limits of the pointer protection stack (eg in 
recursive algorithms), you might be safer writing code like
    PROTECT(fun = eval(e, R_GlobalEnv));
    PROTECT(fun = VECTOR_ELT(fun, 1));
    /* do more stuff with fun */
    UNPROTECT(2);
but I think it is useful to know that the vector accessors and mutators do 
not allocate memory.


A stack imbalance is often due to different numbers of PROTECTs on 
different code paths. These are slightly annoying and become more frequent 
if you use more PROTECTs. On the other hand, R does detect them for you. 
If you don't use enough PROTECTs you get bugs that are very hard to track 
down [the best bet is probably valgrind + gctorture() to provoke them into 
showing themselves early, but that's only available on Linux].

 	-thomas

Michael Dondrup

Wed, May 17, 2006 9:12 AM #

Thank you very much, Thomas!

Thanks to the explanation, I think I could almost track down that bug. May I, 
just for clarification, ask a further bunch of  questions (sorry). From what 
you say, did I get it right: 

- 'error in unprotect: stack imbalance' is only a warning, it will not cause 
termination, unless R is running as an embedded process (I'm working with 
RSPerl package in perl here)?  
- Forgetting to unprotect a value is harmless, and will only provoke these 
warnings?
- If the protect/unprotect is unbalanced within a function call, R will give 
the warning/error already at the exit of this specific function?
- If that is the case, what if I want to return a pointer to a value from a 
function? Do have to unprotect it anyway, before?

btw: I'm working on FreeBSD,  I found an experimental port of valgrind, too.

Thank you very much again!

Michael

On Wednesday 17 May 2006 16:55 Thomas Lumley wrote:

On Wed, 17 May 2006, Michael Dondrup wrote:

Hi,

Im currently trying to debug a 'error in unprotect: stack imbalance'
problem and I am curious about two basic questions on the use of PROTECT
and UNPROTECT, which I could not figure out:

- which objects have to be protected, namely, if the code is something
like:

SEXP fun, e;
/* get the expression e ... */
fun = eval(e, R_GlobalEnv);
/* or like this?: PROTECT(fun = eval(e, R_GlobalEnv)); */
PROTECT(fun = VECTOR_ELT(fun, 1));
/* do more things with fun ... */

does one need to protect the result of a call to 'eval' immediately? And
how about R_tryEval?
While searching for code examples in the sources, I found both protected
evals and fewer non-protected.

The first rule is that any newly created R object needs to be protected
before the garbage collector runs, and unprotected before exiting the
function and after the last time the garbage collector runs.

The second rule is that protection applies to the contents of a variable
(the R object) not to the variable.

The second rule is that protecting an object protects all its elements.

In the example above
     fun = eval(e, R_GlobalEnv);
may create a new object (it might just return a pointer to an existing
function) and so probably needs to be protected.

On the other hand
  fun = VECTOR_ELT(fun, 1);
does not then need protecting. Since fun is protected, its second element
is also protected.

So
    PROTECT(fun = eval(e, R_GlobalEnv));
    fun = VECTOR_ELT(fun, 1);
    /* do more stuff with fun */
    UNPROTECT(1);

If you don't know exactly which functions might return a new object or
trigger the garbage collector it is probably safe to assume that anything
might [this is the advice in 'Writing R Extensiosn'].  Unless you are
getting close to the limits of the pointer protection stack (eg in
recursive algorithms), you might be safer writing code like
    PROTECT(fun = eval(e, R_GlobalEnv));
    PROTECT(fun = VECTOR_ELT(fun, 1));
    /* do more stuff with fun */
    UNPROTECT(2);
but I think it is useful to know that the vector accessors and mutators do
not allocate memory.


A stack imbalance is often due to different numbers of PROTECTs on
different code paths. These are slightly annoying and become more frequent
if you use more PROTECTs. On the other hand, R does detect them for you.
If you don't use enough PROTECTs you get bugs that are very hard to track
down [the best bet is probably valgrind + gctorture() to provoke them into
showing themselves early, but that's only available on Linux].

 	-thomas

Thomas Lumley

Wed, May 17, 2006 9:47 AM #

On Wed, 17 May 2006, Michael Dondrup wrote:

Correct.

Well, it causes a memory leak, and in the unlikely event that you have 
finalizers set on the objects the finalizers won't run. Otherwise, yes.

Not quite. The warning comes on return from .Call(). If the function you 
.Call calls other C functions you will still only get the warning on 
return to R.

Rule 1 applies to the code that calls your function, too.  If you return 
(a pointer to) an object that from the point of view of the calling 
function is newly created, the calling function has to PROTECT it. In 
particular, the return value of .Call will be protected if you store it in 
a variable.

 	-thomas