PROTECT and OCaml GC.

12 messages · Guillaume Yziquel, Whit Armstrong, Simon Urbanek

Original

1

12

Guillaume Yziquel

Sat, Nov 28, 2009 3:50 PM #

Hello.

In the writing of my OCaml-R binding, I'm sort of confused when it comes 
to the use of the PROTECT and UNPROTECT macros.

Basically, I have C stub functions that are in charge of calling R for 
everything. Here's a simple example:

This simply makes a call to findVar and returns the value to Objective 
Caml. It seems to me that I should be writing:

However, as OCaml has its own GC, I'm wondering where to put UNPROTECT. 
Many codes I see on the net UNPROTECT the value just after it has been 
protected. The rationale, it seems, is that the value is at risk only a 
short timeframe after it has been created.

This seems rather curious to me, and I'm wondering if I should not 
rather UNPROTECT the value at the moment OCaml's GC says the value is 
not needed anymore.

Please tell me which option I should go forward with.

(I'll assume for now that OCaml is monothreaded. I do not believe that R 
itself is thread-safe, so I'll first handle this monothreaded case.)

All the best,

Guillaume Yziquel
http://yziquel.homelinux.org/

Sat, Nov 28, 2009 4:25 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20091128/db6c0216/attachment.pl>

Guillaume Yziquel

Sat, Nov 28, 2009 4:50 PM #

Whit Armstrong a ?crit :

Thanks a lot for these pointers.

UNPROTECT_PTR seems quite interesting. As I understand it, it avoids 
caring about protecting and unprotecting in the order the stacks would 
expect. This is quite interesting, since I'd like to keep OCaml's GC to 
do housekeeping, and not rely on referencing counting.

I'm using C as the glue, but I want it to be as thin as possible. I will 
probably not do reference counting in C, for instance. Nevertheless, 
there's obviously good ideas in rabstraction/RObjects that I'll adapt.

Thanks a lot.

Guillaume Yziquel
http://yziquel.homelinux.org/

Sun, Nov 29, 2009 9:45 AM #

On Nov 28, 2009, at 7:50 PM, Guillaume Yziquel wrote:

FWIW what I think you should be really looking at is R_PreserveObject/R_ReleaseObject. I would suggest looking at the many other R embeddings in other languages that already exist since I don't think you approach is very viable (but I think I expressed that already before).

Cheers,
Simon

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Guillaume Yziquel

Sun, Nov 29, 2009 10:57 AM #

Simon Urbanek a ?crit :

OK. Thanks.

Lisp - the only thing I've seen is a translator:

http://dan.corlan.net/R_to_common_lisp_translator/

I haven't found a binding for Haskell. Nor for Scheme.

Do you know of any bindings of R to functional languages?

You expressed the sentiment that it would be a very bad idea to bypass 
the current API. I would be happy to hear why you would think that a 
low-level binding is not possible, or not very viable.

By low-level, I mean a binding that takes hold of R objects without 
using symbols all over to reference them. (Using symbols in the formals, 
the body or the environment of a closure is fine, for instance, but I'd 
like to execute a closure directly, and eventually be able to construct 
R closure from OCaml functions).

Please elaborate on the difficulties you perceive. That would be helpful.

All the best,

Guillaume.

Guillaume Yziquel
http://yziquel.homelinux.org/

Mon, Nov 30, 2009 6:15 AM #

Guillaume,

On Nov 29, 2009, at 13:57 , Guillaume Yziquel wrote:

You're talking about two entirely different things -- bypassing the  
API is a very bad idea, but it has nothing to do with your last  
paragraph. The API gives you access all user-visible aspects of R  
which is all you really need for any embedding -- that includes  
closure body, evaluation etc. I see no reason why you should ever go  
lower than the API since that is unreliable and unsupported and thus  
you won't get any help with that (that is IMHO the main reason why you  
get no responses here - and I wouldn't expect any). All other  
functions are hidden on purpose since they cover internal aspects that  
should not be relied upon.

So again, I just think you're operating on the wrong level here -- and  
this has nothing to do with the fact that you're binding to a  
functional language since the mechanisms are the same regardless of  
the languages (that's why Omegahat was used to bind into any random  
language that seemed useful). You get more headaches since you have to  
decide how to handle closures both ways, but I suspect the practical  
solution is to use evaluators on the side where the function is  
defined (especially for the R side since it includes non-S-language  
code so you simply cannot map it).

If you have suggestions for extending the API, feel free to post them  
with exact explanations how in general that extensions could be useful  
(general is the key word here - I think so far it was rather to hack  
around your way of implementing it). [And FWIW tryEval *is* part of  
the API].

Cheers,
Simon

Guillaume Yziquel

Mon, Nov 30, 2009 9:08 AM #

Simon Urbanek a ?crit :

It's very good to hear that it's two different things. This has been 
quite unclear to me.

Because I've been unable to find what exactly applyClosure or eval 
requires, when it comes to the structure of the argument LANGSXP. For 
example.

Please let me be clear on my intentions:

-1- I intend to use only the API if possible.

-2- If not possible, I will perhaps use #define USE_RINTERNALS, which as 
I understand is not part of the API.

-3- The libR.so with opened symbols is intended only as a replacement of 
GDB during development. Unfortunately, as things are not going as easily 
as it could, I am, for gdb-like purposes, writing progressively a new 
eval / applyClosure duo in OCaml.

The option -3- will not appear in the interface I will release.

In order to discriminate between option -1- and options -1- + -2-, could 
you please answer the following question, which I hope falls in the 
scope of legitimate questions on this mailing list:

Suppose I have an OCaml (or pure C if you wish) linked list of OCaml 
value wrapping SEXP values. Is it possible, using only the API, to 
create a LANGSXP / LISTSXP list out of these SEXPs?

I guess this is the crucial point where I hit the limits of the API. 
Please confirm or infirm.

Will look into Omegahat. Not yet very familiar with R userland.

Ok. So suppose I have wrapped an anonymous R closure, called op.

This closure takes two arguments, a string, and yields a float.

I therefore need to write a function "eval_this_op" whose type would be:

eval_this_op : (string -> int) R.t -> string R.t -> int R.t

Essentially, eval_this_op takes three arguments, a wrapped anonymous R 
closure, an R string, and yields an R integer.

How could you write such an eval_this_op function without first solving 
the crucial issue in the above paragraph, which is basically 
constructing a LANGSXP out of an anonymous closure and an R string?

Please take into account that OCaml's type system is extremely strong. 
"My way of implementing it", as you call it, is essentially the most 
natural way to fit in the OCaml paradigm. I must satisfy both OCaml and 
R paradigms in order to write a correct binding.

Please note that it is not an embedding in a random application. It aims 
to be a full blown binding for general purpose. In OCaml, values are 
immutable. Really, really, really immutable. Or they are signals, 
immutable abstractions describing a value that changes overtime. 
Symbols, variables and such are not welcome. References (~pointers) are 
statically typed and *cannot* be type casted. The type checking is so 
strong that you should almost never have to throw an exception. This 
means avoiding dynamic type-checking everywhere it's possible to avoid. 
This means that a function that takes a sexp to yield the underlying 
function should not have to raise an exception if the sexp is not a 
function. It should therefore not have to dynamically typecheck the sexp 
at runtime. This means that you have to enhance the type system to 
*statically* declare (or infer) that this sexp is a LANGSXP. Therefore 
you have to use a polymorphic type system (somehow ~ C++ templated 
types) to say "lang sxp" "list sxp" "sym sxp", etc... You get the idea?

This is not "my way". It's the OCaml way: They like to statically 
type-check *everything* , including HTML. Please have a look at section 
"Static typing of XHTML with XHTML.M" of

	http://ocsigen.org/eliom/manual/1.2.0/1#p1baseprinciples

Do you know why the Swig module for OCaml is virtually unused? Because 
the OCaml community does not consider it type-safe enough. And it will 
go somehow the same for Haskell.

The "general" aspect of my request therefore concerns bindings to 
languages with 'inferred polymorphic static typing'. Please understand 
what these languages are about before dismissing my remarks as "my way". 
You may not care, you wouldn't be the first.

 From Wikipedia: http://en.wikipedia.org/wiki/Objective_Caml

Please understand that I take no joy and no fun in being a pain.

If you force me to write a binding that wouldn't be type safe, it would 
be unused. This is simply not acceptable to me: I am unfortunately not 
willing to waste my time. And will then eventually have to bypass the 
API. Please help me avoid that as much as it is possible with these 
constraints.

Guillaume Yziquel
http://yziquel.homelinux.org/

Mon, Nov 30, 2009 9:46 AM #

On Nov 30, 2009, at 12:08 , Guillaume Yziquel wrote:

LANGSXP is simply a pairlist representing the expression, e.g. to look  
at "a+2" expression:

 > .Internal(inspect(quote(a+2)))
@1183698 06 LANGSXP g0c0 []
   @101080c 01 SYMSXP g0c0 [MARK,gp=0x4000] "+"
   @1130394 01 SYMSXP g0c0 [MARK] "a"
   @1c384e8 14 REALSXP g0c1 [] (len=1, tl=0) 2

I would suggest you learn more about R first - this is all accessible  
at the R/S language level:

 > x
a + 2
 > x[[1]]
`+`
 > x[[2]]
a
 > x[[3]]
[1] 2

Of course - see CONS/LCONS.

I guess this is the crucial point where I hit the limits of the API.  
Please confirm or infirm.

So again, I just think you're operating on the wrong level here --  
and this has nothing to do with the fact that you're binding to a  
functional language since the mechanisms are the same regardless of  
the languages (that's why Omegahat was used to bind into any random  
language that seemed useful).

Will look into Omegahat. Not yet very familiar with R userland.

You get more headaches since you have to decide how to handle  
closures both ways, but I suspect the practical solution is to use  
evaluators on the side where the function is defined (especially  
for the R side since it includes non-S-language code so you simply  
cannot map it).

Ok. So suppose I have wrapped an anonymous R closure, called op.

This closure takes two arguments, a string, and yields a float.

I therefore need to write a function "eval_this_op" whose type would  
be:

eval_this_op : (string -> int) R.t -> string R.t -> int R.t

Essentially, eval_this_op takes three arguments, a wrapped anonymous  
R closure, an R string, and yields an R integer.

How could you write such an eval_this_op function without first  
solving the crucial issue in the above paragraph, which is basically  
constructing a LANGSXP out of an anonymous closure and an R string?

If you have suggestions for extending the API, feel free to post  
them with exact explanations how in general that extensions could  
be useful (general is the key word here - I think so far it was  
rather to hack around your way of implementing it). [And FWIW  
tryEval *is* part of the API].

Please take into account that OCaml's type system is extremely  
strong. "My way of implementing it", as you call it, is essentially  
the most natural way to fit in the OCaml paradigm. I must satisfy  
both OCaml and R paradigms in order to write a correct binding.

Please note that it is not an embedding in a random application. It  
aims to be a full blown binding for general purpose. In OCaml,  
values are immutable. Really, really, really immutable. Or they are  
signals, immutable abstractions describing a value that changes  
overtime. Symbols, variables and such are not welcome. References  
(~pointers) are statically typed and *cannot* be type casted. The  
type checking is so strong that you should almost never have to  
throw an exception. This means avoiding dynamic type-checking  
everywhere it's possible to avoid. This means that a function that  
takes a sexp to yield the underlying function should not have to  
raise an exception if the sexp is not a function. It should  
therefore not have to dynamically typecheck the sexp at runtime.  
This means that you have to enhance the type system to *statically*  
declare (or infer) that this sexp is a LANGSXP. Therefore you have  
to use a polymorphic type system (somehow ~ C++ templated types) to  
say "lang sxp" "list sxp" "sym sxp", etc... You get the idea?

This is not "my way". It's the OCaml way: They like to statically  
type-check *everything* , including HTML. Please have a look at  
section "Static typing of XHTML with XHTML.M" of

	http://ocsigen.org/eliom/manual/1.2.0/1#p1baseprinciples

Do you know why the Swig module for OCaml is virtually unused?  
Because the OCaml community does not consider it type-safe enough.  
And it will go somehow the same for Haskell.

The "general" aspect of my request therefore concerns bindings to  
languages with 'inferred polymorphic static typing'. Please  
understand what these languages are about before dismissing my  
remarks as "my way". You may not care, you wouldn't be the first.

You're missing my point - "your way" was to hack into the internals of  
how R represents SEXPs (going down to each pointer inside the SEXP  
headers). None of the above applies to my remark.

Cheers,
Simon

Guillaume Yziquel

Mon, Nov 30, 2009 10:14 AM #

Simon Urbanek a ?crit :

I've gathered that LANGSXP and LISTSXP are structured in this way. By 
reading the header files. Please see:

https://stat.ethz.ch/pipermail/r-devel/2009-November/055813.html

Now to continue one the topic of the top paragraph:

I've tried sending a LANGSXP where the CAR element is a SYMSXP. eval 
workd. I've tried sending a LANGSXP where the CAR element is a CLOSXP. 
eval doesn't work. This is what I meant about the "structure of the 
argument LANGSXP". And it's contained in the link above.

And it goes then to my other question: How can you pass to eval a 
LANGSXP where the CAR is an *anonymous* function, no SYMSXP involved?

This does not seem documented in R-ints.pdf, R-exts.pdf or R-lang.pdf.

Great. That's the kind of fruitful interaction that could have made me 
gain a few days and not bypass the API. Thanks.

You're also missing my point. "my way" is the only way I've come up with 
to examine how to make sure that the static typing system I'm putting in 
place fits with the internal structure of SEXPs. I do need to know the 
internal structure of sexps and the way they evolve under the influence 
of function such as eval, install, applyClosure, in order to statically 
type my code. Same link expressing this concern:

https://stat.ethz.ch/pipermail/r-devel/2009-November/055813.html

Documentation is terse on precise structure of sexps. You get 
description of individual sexps, not a *precise* description of how they 
are assembled, which is what the typing will have to express. Much in 
the same spirit as the link below, which I really entice you to read:

http://ocsigen.org/eliom/manual/1.2.0/1#p1baseprinciples

Statically typing the internal structure of assembled sexps is no 
different than statically typing XHTML.

Glad that we're heading somewhere...

All the best,

Guillaume Yziquel
http://yziquel.homelinux.org/

Mon, Nov 30, 2009 11:16 AM #

On Nov 30, 2009, at 13:14 , Guillaume Yziquel wrote:

You just pass it as value of the call. I suspect the reason it doesn't  
work is in your code, not in the facility (note that the link above is  
useless since the construction is mystery - if you were constructing  
it right, it would work ;)).

Small example:

SEXP myEval(SEXP FN, SEXP first_arg) {
   return eval(LCONS(FN, CONS(first_arg, R_NilValue)), R_GlobalEnv);
}

 > .Call("myEval",function(x) x + 1, 10)
[1] 11

 > .Internal(inspect(function(x) x + 1))
@19e376c 03 CLOSXP g0c0 [ATT]
FORMALS:
   @19e399c 02 LISTSXP g0c0 []
     TAG: @1029840 01 SYMSXP g0c0 [MARK,NAM(2)] "x"
     @1007378 01 SYMSXP g0c0 [MARK,NAM(2)] ""
BODY:
   @19e3948 06 LANGSXP g0c0 []
     @101080c 01 SYMSXP g0c0 [MARK,gp=0x4000] "+"
     @1029840 01 SYMSXP g0c0 [MARK,NAM(2)] "x"
     @19e1248 14 REALSXP g0c1 [] (len=1, tl=27187120) 1
CLOENV:
   @1023c38 04 ENVSXP g0c0 [MARK,NAM(2),gp=0x8000]
ATTRIB:
   @19e3750 02 LISTSXP g0c0 []
     TAG: @1006ee0 01 SYMSXP g0c0 [MARK,gp=0x4000] "source"
     @19e1228 16 STRSXP g0c1 [] (len=1, tl=16806832)
       @150cdc8 09 CHARSXP g0c3 [gp=0x20] "function(x) x + 1"

... or reading R-ext:

"There are a series of small macros/functions to help construct  
pairlists and language objects (whose internal structures just differ  
by SEXPTYPE. Function CONS(u, v) is the basic building block: is  
constructs a pairlist from u followed by v (which is a pairlist or  
R_NilValue). LCONS is a variant that constructs a language object.  
Functions list1 to list4 construct a pairlist from one to four items,  
andlang1 to lang4 do the same for a language object (a function to  
call plus zero to three arguments). Function elt and lastElt find the  
ith element and the last element of a pairlist, and nthcdr returns a  
pointer to the nth position in the pairlist (whose CAR is the nth  
item)."

No, you don't - you do care what the *types* are (i.e. TYPEOF) and how  
they behave, but you should *not* care how they are implemented in the  
internals. That is deliberately hidden by the API.

Hopefully not - again, see above comment.

Cheers,
Simon

Guillaume Yziquel

Mon, Nov 30, 2009 1:07 PM #

Simon Urbanek a ?crit :

In the eval function in eval.c, you have:

So imagine you have a LANGSXP whose CAR is a CLOSXP, the execution goes 
into the last line of the code snippet above. And re-entring eval with a 
CLOSXP, the code goes into

so PROTECT(op = eval(CAR(e), rho)) evaluates to R_NilValue.

I figured out that's why evaluating a LANGSXP with CAR a CLOSXP simply 
fails.

I'll have a look at the code snippet you gave, since I do not understand 
why it doesn't fail the same way mine does.

Thanks a lot.

Guillaume Yziquel
http://yziquel.homelinux.org/

Mon, Nov 30, 2009 1:25 PM #

On Nov 30, 2009, at 16:07 , Guillaume Yziquel wrote:

Nope, it simply evaluates to itself (with ref count increased) - see  
tmp = e; ..; return(tmp).

Wrong ;). A closure is a constant like any other object so it  
evaluates to itself.

Cheers,
Simon