Skip to content

Parsing code with newlines

4 messages · Tomas Kalibera, Mikhail Titov, Peter Dalgaard

#
Hello!

This is my first post here. I came across the very same problem.
It can be reproduced within modified tests/Embedding/RParseEval.c

Actually this example has another issue, namely it doesn't wrap
everything in R_ToplevelExec . This is a major show stopper for
newcomers as that function is barely mentioned anywhere and longjmp into
terminated setuploop function followed by R_suicide look like a mystery.

Error: bad value
Fatal error: unable to initialize the JIT


That aside, here is the code with newlines that fails to parse. I hope
it will paste alright here.


#include "embeddedRCall.h"
#include <R_ext/Parse.h>

int
main(int argc, char *argv[])
{
    SEXP e, tmp;
    int hadError;
    ParseStatus status;

    init_R(argc, argv);

    PROTECT(tmp = mkString("\n\r ls()"));
    PROTECT(e = R_ParseVector(tmp, 1, &status, R_NilValue));
    if (status != PARSE_OK)
    {
        printf("boo boo\n");
    }
    else
    {
        PrintValue(e);
        R_tryEval(VECTOR_ELT(e,0), R_GlobalEnv, &hadError);
    }
    UNPROTECT(2);

    end_R();
    return(0);
}


--
Mikhail
5 days later
#
On 4/5/19 8:14 AM, Mikhail Titov wrote:
Please check https://www.r-project.org/posting-guide.html and update 
your post if you still need to get help here - from your current post I 
am not sure what you did, what was the error you got and from which 
tool, why you think the error was a result of something not working 
correctly/as documented, etc. The original post with the same subject 
you are probably referring to had the same problem.

Please also note that "tests" (tests/Embedding/RParseEval.c) are not 
examples - if they do not catch R errors in some cases that is perfectly 
ok, they also may use internal API that is indeed not documented e.g. in 
Writing R Extensions. Note Writing R Extensions has a section on 
embedding R and on cleanup handlers.

Best
Tomas
#
On Wed, Apr 10, 2019 at 5:06 AM, Tomas Kalibera <tomas.kalibera at gmail.com> wrote:
The original post is linked via e-mail headers however it goes back a
decade. It shows up linked as a thread alright in Gnus. Hence I thought
it would be alright to jump straight to the matter.

Here is the link to original discussion
https://stat.ethz.ch/pipermail/r-devel/2008-August/050332.html

At this point, I would like to report two bugs in "Writing R Extensions"
documentation. From that document it is not clear why line feeds (0x0A)
have to be removed from the input string to be parsed. Also nowhere in
that document it mentions R_TopLevelExec if parsing needs to be done in
the outer context. That is not when our C function is called from R, but
when we are trying to parse R code in C directly outside of main loop.
These are big show stoppers for newcomers.

The barely modified test code I had in my previous post, does not parse
what would seem a legit sample string "\r\n ls()". However, it does
parse alright "\n ls()". Nowhere in the docs the intolerance to line
feeds is mentioned. It is reproducible from R console as well.

,----[ R console session ]
| > parse(text="\r\n ls()")
| Error in parse(text = "\r\n ls()") : <text>:1:1: unexpected input
| 1:
|     ^
| >
`----

Another problem with the aforementioned documentation is parsing
erroneous expressions like "deadbeef<-function(,bad){}" in top level
context. Instead of returning an error from parsing, it crashes
(with R_suicide) unless the call is wrapped in R_TopLevelExec.
Where would be a good example on top level context parsing then? I have
no problems skipping error checks and/or with the use of undocumented
functions. However I would rather prefer to avoid major unexpected
crashes. That example does NOT use any of the undocumented API and therefore is
misleading. I believe it SHOULD include R_TopLevelExec and that function
SHOULD be in the docs.
I have no problems with the rest of the document on embedding and clean
up in general.
--
Mikhail
#
?\r" is CR not LF. On systems that use CRLF as newline, the combination should be "\n" at the C (or R) level. 

However, I suppose there is no particular reason not to treat CR as whitespace, as does happen with FF and HT.

-pd