Inconsistent behavior for the C AP's R_ParseVector() ?
Le lun. 9 d?c. 2019 ? 09:57, Tomas Kalibera <tomas.kalibera at gmail.com> a ?crit :
On 12/9/19 2:54 PM, Laurent Gautier wrote: Le lun. 9 d?c. 2019 ? 05:43, Tomas Kalibera <tomas.kalibera at gmail.com> a ?crit :
On 12/7/19 10:32 PM, Laurent Gautier wrote:
Thanks for the quick response Tomas.
The same error is indeed happening when trying to have a zero-length
variable name in an environment. The surprising bit is then "why is this
happening during parsing" (that is why are variables assigned to an
environment) ?
The emitted R error (in the R console) is not a parse (syntax) error, but
an error emitted during parsing when the parser tries to intern a name -
look it up in a symbol table. Empty string is not allowed as a symbol name,
and hence the error. In the call "list(''=1)" , the empty name is what
could eventually become a name of a local variable inside list(), even
though not yet during parsing.
Thanks Tomas. I guess this has do with R expressions being lazily evaluated, and names of arguments in a call are also part of the expression. Now the puzzling part is why is that at all part of the parsing: I would have expected R_ParseVector() to be restricted to parsing... Now it feels like R_ParseVector() is performing parsing, and a first level of evalution for expressions that "should never work" (the empty name). Think of it as an exception in say Python. Some failures during parsing result in an exception (called error in R and implemented using a long jump). Any time you are calling into R you can get an error; out of memory is also signalled as R error.
The surprising bit for me was that I had expected the function to solely perform parsing. I did expect an exception (and a jmp smashing the stack) when the function concerned is in the C-API, is parsing a string, and is using a parameter (pointer) to store whether parsing was a failure or a success. Since you are making a comparison with Python, the distinction I am making between parsing and evaluation seem to apply there. For example: ```
import parser
parser.expr('1+')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
1+
^
SyntaxError: unexpected EOF while parsing
p = parser.expr('list(""=1)')
p
<parser.st at 0x7f360e5329f0>
eval(p)
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: eval() arg 1 must be a string, bytes or code object
list(""=1)
File "<stdin>", line 1 SyntaxError: keyword can't be an expression ```
There is probably some error in how the external code is handling R
errors (Fatal error: unable to initialize the JIT, stack smashing, etc)
and possibly also how R is initialized before calling ParseVector. Probably
you would get the same problem when running say "stop('myerror')". Please
note R errors are implemented as long-jumps, so care has to be taken when
calling into R, Writing R Extensions has more details (and section 8
specifically about embedding R). This is unlike parse (syntax) errors
signaled via return value to ParseVector()
The issue is that the segfault (because of stack smashing, therefore because of what also suspected to be an incontrolled jump) is happening within the execution of R_ParseVector(). I would think that an issue with the initialization of R is less likely because the project is otherwise used a fair bit and is well covered by automated continuous tests. After looking more into R's gram.c I suspect that an execution context is required for R_ParseVector() to know to properly work (know where to jump in case of error) when the parsing code decides to fail outside what it thinks is a syntax error. If the case, this would make R_ParseVector() function well when called from say, a C-extension to an R package, but fail the way I am seeing it fail when called from an embedded R. Yes, contexts are used internally to handle errors. For external use please see Writing R Extensions, section 6.12.
I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and this is seems to help me overcome the issue. Thanks for the pointer. Best, Laurent
Best Tomas Best, Laurent
Best, Tomas We are otherwise aware that the error is not occurring in the R console, but can be traced to a call to R_ParseVector() in R's C API:( https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509 ). Our specific setup is calling an embedded R from Python, using the cffi library. An error on end was the first possibility considered, but the puzzling specificity of the error (as shown below other parsing errors are handled properly) and the difficulty tracing what is in happening in R_ParseVector() made me ask whether someone on this list had a suggestion about the possible issue" ```
import rpy2.rinterface as ri>>> ri.initr()>>> e = ri.parse("list(''=1+") ---------------------------------------------------------------------------RParsingError Traceback (most recent call last)>>> e = ri.parse("list(''=123") R[write to console]: Error: attempt to use zero-length variable name
R[write to console]: Fatal error: unable to initialize the JIT *** stack smashing detected ***: <unknown> terminated ``` Le lun. 2 d?c. 2019 ? 06:37, Tomas Kalibera <tomas.kalibera at gmail.com> a ?crit :
Dear Laurent,
could you please provide a complete reproducible example where parsing
results in a crash of R? Calling parse(text="list(''=123") from R works
fine for me (gives Error: attempt to use zero-length variable name).
I don't think the problem you observed could be related to the memory
leak. The leak is on the heap, not stack.
Zero-length names of elements in a list are allowed. They are not the
same thing as zero-length variables in an environment. If you try to
convert "lst" from your example to an environment, you would get the
error (attempt to use zero-length variable name).
Best
Tomas
On 11/30/19 11:55 PM, Laurent Gautier wrote:
Hi again, Beside R_ParseVector()'s possible inconsistent behavior, R's handling
of
zero-length named elements does not seem consistent either: ```
lst <- list() lst[[""]] <- 1 names(lst)
[1] ""
list("" = 1)
Error: attempt to use zero-length variable name ``` Should the parser be made to accept as valid what is otherwise possible when using `[[<` ? Best, Laurent Le sam. 30 nov. 2019 ? 17:33, Laurent Gautier <lgautier at gmail.com> a
?crit :
I found the following code comment in `src/main/gram.c`: ``` /* Memory leak yyparse(), as generated by bison, allocates extra space for the parser stack using malloc(). Unfortunately this means that there is a memory leak in case of an R error (long-jump). In principle, we could define yyoverflow() to relocate the parser stacks for bison and allocate say
on
the R heap, but yyoverflow() is undocumented and somewhat complicated (we would have to replicate some macros from the generated parser
here).
The same problem exists at least in the Rd and LaTeX parsers in tools. */ ``` Could this be related to be issue ? Le sam. 30 nov. 2019 ? 14:04, Laurent Gautier <lgautier at gmail.com> a ?crit :
Hi,
The behavior of
```
SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
```
defined in `src/include/R_ext/Parse.h` appears to be inconsistent
depending on the string to be parsed.
Trying to parse a string such as `"list(''=1+"` sets the
`ParseStatus` to incomplete parsing error but trying to parse
`"list(''=123"` will result in R sending a message to the console
(followed but a crash):
``` R[write to console]: Error: attempt to use zero-length variable
nameR[write to console]: Fatal error: unable to initialize the JIT*** stack smashing detected ***: <unknown> terminated
``` Is there a reason for the difference in behavior, and is there a
workaround ?
Thanks, Laurent
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel