R strings, null-terminated or size delimited?

8 messages · Guillaume Yziquel, Simon Urbanek, Duncan Murdoch

Sat, Nov 21, 2009 1:12 PM #

Hello.

I've been looking at vecsexps for my binding.

Concerning strings, I'm wondering: are they supposed to be 
null-delimited? Are they delimited by the info in the SEXPHEADER macro 
in Rinternals.h?

Basically, what are the macros or functions to access the values of the 
vecsexps? I'm thinking of CHARSXPs and INTSXPs for the moment...

All the best,

Guillaume Yziquel
http://yziquel.homelinux.org/

Simon Urbanek

Sat, Nov 21, 2009 2:27 PM #

On Nov 21, 2009, at 4:12 PM, Guillaume Yziquel wrote:

Yes, they are null-delimited when you create/access them.

You should not be touching or reading that.

VECTOR_ELT and SET_VECTOR_ELT (assuming that you're referring to  
VECSXP which is are generic vectors).

Those are entirely different - CHARSXP are not vectors but strings  
(see mkChar et al., CHAR, ...) and INTSXP are integer arrays (in C  
speak) accessed using INTEGER.

Please read R-exts - it's better than guessing.

Cheers,
Simon

Guillaume Yziquel

Sat, Nov 21, 2009 3:31 PM #

Simon Urbanek a ?crit :

OK. Fair enough. But is guaranteed that null-delimitation ends where the 
  vecsxp field of the * VECSEXP tells where the R vector should end? Let 
me rephrase that:

-1- Should I consider it a bug if the two informations differ?

-2- What's the "safest" way out of the two?

I believe I should. I'd like the OCaml / R binding to be closely knit to 
R internals. One reason would be for speed, the other being that I'd 
like to make use of camlp4 to write syntax extensions to mix OCaml and R 
syntax. It's therefore important for me not to rely on the R interpreter 
to be active when building R values. Or when marshaling R values via 
OCaml. There are numerous other issues aside this one.

I'm already using #define USE_RINTERNALS in my .c file to inspect R values.

No. I'm refering to INTSXP for now. But I see what you mean:

VECTOR_ELT is not suitable for INTSXP arrays. I need to convert to 
INTSXP array to an OCaml list / array.

OK. They're not vectors. They're VECTOR_SEXPRECs.

Funny, I have R-exts.pdf and R-ints.pdf opened. They're fine when it 
comes to writing R extensions. Not when writing bindings embedding R 
into OCaml so that you can beta-reduce isomorphically in R and OCaml.

I'm already using heretic features in OCaml (namely Obj.magic) in order 
to do this binding. I do not mind using heretic features of the R API.

I do not mean to be a pain, but I have to do what needs to be done. If I 
find on my way that #define USE_RINTERNALS is overkill, I'll gladly drop it.

For instance, here's one of my issues: I've extracted the R SEXP for the 
"str" symbol. It's a promise. Now, how do I map such a SEXP to an OCaml 
function? Haven't found that in R-ints.pdf or R-exts.pdf. There's talk 
about functions, but promises are somewhat overlooked. However, such a 
mapping is crucial to me.

I was not guessing when I was trying to look at the internal structure 
of R data. Simply trying to get a grip on how to execute promises, and 
therefore examining such a promise:

Or, following structures in Rinternals.h:

# R.Internal.C.t_of_sexp (R.Raw.sexp_of_t (R.symbol "str"));;
- : R.Internal.C.t =
Val
 {content =
   PROMSXP
    {prom_value =
      Val
       {content =
         SYMSXP
          {pname = Val {content = NILSXP};
           sym_value = R.Internal.C.Recursive <lazy>;
           internal = Val {content = NILSXP}}};
     R.Internal.C.expr =
      Val
       {content =
         LANGSXP
          {carval =
            Val
             {content =
               SYMSXP
                {pname = Val {content = CHARSXP "lazyLoadDBfetch"};
                 sym_value = Val {content = BUILTINSXP 687};
                 internal = Val {content = NILSXP}}};
           cdrval =
            Val
             {content =
               LISTSXP
                {carval = Val {content = INTSXP [105; 153119]};
                 cdrval =
                  Val
                   {content =
                     LISTSXP
                      {carval =
                        Val
                         {content =
                           SYMSXP
                            {pname = Val {content = CHARSXP "datafile"};
                             sym_value =
                              Val
                               {content =
                                 SYMSXP
                                  {pname = Val {content = NILSXP};
                                   sym_value = R.Internal.C.Recursive <lazy>;
                                   internal = Val {content = NILSXP}}};
                             internal = Val {content = NILSXP}}};
                       cdrval =
                        Val
                         {content =
                           LISTSXP
                            {carval =
                              Val
                               {content =
                                 SYMSXP
                                  {pname =
                                    Val {content = CHARSXP "compressed"};
                                   sym_value =
                                    Val
                                     {content =
                                       SYMSXP
                                        {pname = Val {content = NILSXP};
                                         sym_value =
                                          R.Internal.C.Recursive <lazy>;
                                         internal = Val {content = NILSXP}}};
                                   internal = Val {content = NILSXP}}};
                             cdrval =
                              Val
                               {content =
                                 LISTSXP
                                  {carval =
                                    Val
                                     {content =
                                       SYMSXP
                                        {pname =
                                          Val {content = CHARSXP "envhook"};
                                         sym_value =
                                          Val
                                           {content =
                                             SYMSXP
                                              {pname = Val {content = NILSXP};
                                               sym_value =
                                                R.Internal.C.Recursive <lazy>;
                                               internal =
                                                Val {content = NILSXP}}};
                                         internal = Val {content = NILSXP}}};
                                   cdrval = Val {content = NILSXP};
                                   tagval = Val {content = NILSXP}}};
                             tagval = Val {content = NILSXP}}};
                       tagval = Val {content = NILSXP}}};
                 tagval = Val {content = NILSXP}}};
           tagval = Val {content = NILSXP}}};
     R.Internal.C.env = Val {content = ENVSXP}}}
#

For instance, an issue I'd like advice on is: what does such a symbol mean?

And how is it treated when "str" is executed?

All the best.

Guillaume Yziquel
http://yziquel.homelinux.org/

Duncan Murdoch

Sat, Nov 21, 2009 4:25 PM #

On 21/11/2009 6:31 PM, Guillaume Yziquel wrote:

You are probably not going to be able to do that.  Take your example of 
the promise below:  to evaluate a promise, you need to evaluate the 
expression attached to it in the R interpreter.  (This is discussed in 
the R Language Definition.)

You can put probably put together simple R objects like integer arrays 
without having R running, but anything substantial isn't going to be 
feasible.

Duncan Murdoch

I'm already using #define USE_RINTERNALS in my .c file to inspect R values.

Basically, what are the macros or functions to access the values of 
the vecsexps?

VECTOR_ELT and SET_VECTOR_ELT (assuming that you're referring to VECSXP 
which is are generic vectors).

No. I'm refering to INTSXP for now. But I see what you mean:

#define INTEGER(x)      ((int *) DATAPTR(x))
#define VECTOR_ELT(x,i) ((SEXP *) DATAPTR(x))[i]

VECTOR_ELT is not suitable for INTSXP arrays. I need to convert to 
INTSXP array to an OCaml list / array.

I'm thinking of CHARSXPs and INTSXPs for the moment...

Those are entirely different - CHARSXP are not vectors but strings (see 
mkChar et al., CHAR, ...) and INTSXP are integer arrays (in C speak) 
accessed using INTEGER.

OK. They're not vectors. They're VECTOR_SEXPRECs.

Please read R-exts - it's better than guessing.

Funny, I have R-exts.pdf and R-ints.pdf opened. They're fine when it 
comes to writing R extensions. Not when writing bindings embedding R 
into OCaml so that you can beta-reduce isomorphically in R and OCaml.

Cheers,
Simon

I'm already using heretic features in OCaml (namely Obj.magic) in order 
to do this binding. I do not mind using heretic features of the R API.

I do not mean to be a pain, but I have to do what needs to be done. If I 
find on my way that #define USE_RINTERNALS is overkill, I'll gladly drop it.

For instance, here's one of my issues: I've extracted the R SEXP for the 
"str" symbol. It's a promise. Now, how do I map such a SEXP to an OCaml 
function? Haven't found that in R-ints.pdf or R-exts.pdf. There's talk 
about functions, but promises are somewhat overlooked. However, such a 
mapping is crucial to me.

I was not guessing when I was trying to look at the internal structure 
of R data. Simply trying to get a grip on how to execute promises, and 
therefore examining such a promise:

# R.Internal.Pretty.t_of_sexp (R.Raw.sexp_of_t (R.symbol "str"));;
- : R.Internal.Pretty.t =
PROMISE
 {value = SYMBOL None;
  expr =
   CALL (SYMBOL (Some ("lazyLoadDBfetch", BUILTIN)),
    [INT [105; 153119]; Unknown; Unknown; Unknown]);
  env = Unknown}

Or, following structures in Rinternals.h:

# R.Internal.C.t_of_sexp (R.Raw.sexp_of_t (R.symbol "str"));;
- : R.Internal.C.t =
Val
 {content =
   PROMSXP
    {prom_value =
      Val
       {content =
         SYMSXP
          {pname = Val {content = NILSXP};
           sym_value = R.Internal.C.Recursive <lazy>;
           internal = Val {content = NILSXP}}};
     R.Internal.C.expr =
      Val
       {content =
         LANGSXP
          {carval =
            Val
             {content =
               SYMSXP
                {pname = Val {content = CHARSXP "lazyLoadDBfetch"};
                 sym_value = Val {content = BUILTINSXP 687};
                 internal = Val {content = NILSXP}}};
           cdrval =
            Val
             {content =
               LISTSXP
                {carval = Val {content = INTSXP [105; 153119]};
                 cdrval =
                  Val
                   {content =
                     LISTSXP
                      {carval =
                        Val
                         {content =
                           SYMSXP
                            {pname = Val {content = CHARSXP "datafile"};
                             sym_value =
                              Val
                               {content =
                                 SYMSXP
                                  {pname = Val {content = NILSXP};
                                   sym_value = R.Internal.C.Recursive <lazy>;
                                   internal = Val {content = NILSXP}}};
                             internal = Val {content = NILSXP}}};
                       cdrval =
                        Val
                         {content =
                           LISTSXP
                            {carval =
                              Val
                               {content =
                                 SYMSXP
                                  {pname =
                                    Val {content = CHARSXP "compressed"};
                                   sym_value =
                                    Val
                                     {content =
                                       SYMSXP
                                        {pname = Val {content = NILSXP};
                                         sym_value =
                                          R.Internal.C.Recursive <lazy>;
                                         internal = Val {content = NILSXP}}};
                                   internal = Val {content = NILSXP}}};
                             cdrval =
                              Val
                               {content =
                                 LISTSXP
                                  {carval =
                                    Val
                                     {content =
                                       SYMSXP
                                        {pname =
                                          Val {content = CHARSXP "envhook"};
                                         sym_value =
                                          Val
                                           {content =
                                             SYMSXP
                                              {pname = Val {content = NILSXP};
                                               sym_value =
                                                R.Internal.C.Recursive <lazy>;
                                               internal =
                                                Val {content = NILSXP}}};
                                         internal = Val {content = NILSXP}}};
                                   cdrval = Val {content = NILSXP};
                                   tagval = Val {content = NILSXP}}};
                             tagval = Val {content = NILSXP}}};
                       tagval = Val {content = NILSXP}}};
                 tagval = Val {content = NILSXP}}};
           tagval = Val {content = NILSXP}}};
     R.Internal.C.env = Val {content = ENVSXP}}}
#

For instance, an issue I'd like advice on is: what does such a symbol mean?

                           SYMSXP
                            {pname = Val {content = CHARSXP "datafile"};
                             sym_value =
                              Val
                               {content =
                                 SYMSXP
                                  {pname = Val {content = NILSXP};
                                   sym_value = R.Internal.C.Recursive <lazy>;
                                   internal = Val {content = NILSXP}}};
                             internal = Val {content = NILSXP}}};

And how is it treated when "str" is executed?

All the best.

Guillaume Yziquel

Sat, Nov 21, 2009 4:44 PM #

Duncan Murdoch a ?crit :

That's precisely the issue. I want to map a functional language to a 
functional language. And keep the same evaluation semantics. I do not 
(yet?) see why it should not be feasible.

If this is done properly, OCaml could then compile R code natively. That 
would be really nice. There would be other advantages in integrating the 
two languages cleanly.

So, taking the example of promises, I need to map it to its OCaml 
semantic equivalent, which seems to be a Lazy.t structure. That doesn't 
seem (yet) unfeasible.

Thank you for your pointer to the R Language Definition. Starting by R 
Internals was perhaps a bit brutal.

All the best,

Guillaume Yziquel
http://yziquel.homelinux.org/

Duncan Murdoch

Sat, Nov 21, 2009 5:11 PM #

On 21/11/2009 7:44 PM, Guillaume Yziquel wrote:

R is a fairly quirky and irregular language, with lots of functions 
implemented in C code, so you haven't taken on a small project.  But I 
wish you luck.

Duncan Murdoch

2 days later

Guillaume Yziquel

Mon, Nov 23, 2009 5:11 PM #

Duncan Murdoch a ?crit :

I've got code that does basic things. It relies essentially on tryEval. 
But I have issues with the typing system. Let me explain on a rather 
simple example. (Please keep in mind that OCaml typing is polymorphic, 
somehow in the spirit of C++ template types).

I have implemented an OCaml function that is of type

	val R.force : 'a Lazy.t R.t -> 'a R.t

Wich means that it takes as first argument a R value, that denotes an 'a 
Lazy.t value in OCaml. An 'a Lazy.t value is basically what you would 
call a promise, promising to deliver a value of type 'a.

And 'R.force p' would evaluate to an R value denoting an OCaml value of 
type 'a.

Basically, this R.t type is a polymorphic type whose semantics would 
hopefully be to be an isomorphism between R values and OCaml types.

But R forces promises recursively. This means that if x is the promise 
of a promise in R, x would be of type 'a Lazy.t Lazy.t R.t.

Forcing it in R would give a value which should be of type 'a t.

However, OCaml typing is static typing. Which means it does not keep 
track of typing at runtime. It relies only on static typing.

So, logically, OCaml would believe 'R.force x' to be of type 'a Lazy.t 
R.t. Which is inconsistent with R semantics. This screws up the typing 
system of OCaml, and segfaults are therefore close nearby.

So this is a simple example of why I would need to go into the nitty 
gritty details of R code. To see how to slowly map R semantics into OCaml.

But I have another issue, which is basically that I'd like to follow the 
code of the eval function, because there are LANGSXP which I can 
evaluate, and others which fails miserably. From my wrapping of the 
quantmod library:

The R error message is rather obscure to me.

That's why I'd like to write some stub code around, say, the promiseArgs 
and the applyClosure functions, so that I could test them with different 
values in OCaml's interactive toplevel.

Unfortunately, or fortunately, (depends on the point of view), I've 
looked at symbols of libR.so (I'm on a Debian box, with Debian R), and I 
fail to see such symbols exported.

How could I get to bind to these functions, without having to compile my 
stuff and R at the same time?

All the best,

Guillaume Yziquel
http://yziquel.homelinux.org/

Guillaume Yziquel

Mon, Nov 23, 2009 5:24 PM #

Guillaume Yziquel a ?crit :

Just saw applyClosure,

but promiseArgs is nowhere to be seen...

Guillaume Yziquel
http://yziquel.homelinux.org/