Laurent,
the main point here is that ParseVector() just like any other R API has to
be called in a correct context since it can raise errors so the issue was
that your C code has a bug of not setting R correctly (my guess would be
your'e not creating the initial context necessary in embedded R). There are
many different errors, your is just one of many that can occur - any R API
call that does allocation (and parsing obviously does) can cause errors.
Note that this is true for pretty much all R API functions.
Cheers,
Simon
On Dec 14, 2019, at 11:25 AM, Laurent Gautier <lgautier at gmail.com>
Le lun. 9 d?c. 2019 ? 09:57, Tomas Kalibera <tomas.kalibera at gmail.com> a
?crit :
On 12/9/19 2:54 PM, Laurent Gautier wrote:
Le lun. 9 d?c. 2019 ? 05:43, Tomas Kalibera <tomas.kalibera at gmail.com>
On 12/7/19 10:32 PM, Laurent Gautier wrote:
Thanks for the quick response Tomas.
The same error is indeed happening when trying to have a zero-length
variable name in an environment. The surprising bit is then "why is
happening during parsing" (that is why are variables assigned to an
environment) ?
The emitted R error (in the R console) is not a parse (syntax) error,
an error emitted during parsing when the parser tries to intern a name
look it up in a symbol table. Empty string is not allowed as a symbol
and hence the error. In the call "list(''=1)" , the empty name is what
could eventually become a name of a local variable inside list(), even
though not yet during parsing.
Thanks Tomas.
I guess this has do with R expressions being lazily evaluated, and names
of arguments in a call are also part of the expression. Now the puzzling
part is why is that at all part of the parsing: I would have expected
R_ParseVector() to be restricted to parsing... Now it feels like
R_ParseVector() is performing parsing, and a first level of evalution
expressions that "should never work" (the empty name).
Think of it as an exception in say Python. Some failures during parsing
result in an exception (called error in R and implemented using a long
jump). Any time you are calling into R you can get an error; out of
is also signalled as R error.
The surprising bit for me was that I had expected the function to solely
perform parsing. I did expect an exception (and a jmp smashing the stack)
when the function concerned is in the C-API, is parsing a string, and is
using a parameter (pointer) to store whether parsing was a failure or a
success.
Since you are making a comparison with Python, the distinction I am
between parsing and evaluation seem to apply there. For example:
```
import parser
parser.expr('1+')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
1+
^
SyntaxError: unexpected EOF while parsing
p = parser.expr('list(""=1)')
p
<parser.st at 0x7f360e5329f0>
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: eval() arg 1 must be a string, bytes or code object
File "<stdin>", line 1
SyntaxError: keyword can't be an expression
```
There is probably some error in how the external code is handling R
errors (Fatal error: unable to initialize the JIT, stack smashing,
and possibly also how R is initialized before calling ParseVector.
you would get the same problem when running say "stop('myerror')".
note R errors are implemented as long-jumps, so care has to be taken
calling into R, Writing R Extensions has more details (and section 8
specifically about embedding R). This is unlike parse (syntax) errors
signaled via return value to ParseVector()
The issue is that the segfault (because of stack smashing, therefore
because of what also suspected to be an incontrolled jump) is happening
within the execution of R_ParseVector(). I would think that an issue
the initialization of R is less likely because the project is otherwise
used a fair bit and is well covered by automated continuous tests.
After looking more into R's gram.c I suspect that an execution context
required for R_ParseVector() to know to properly work (know where to
in case of error) when the parsing code decides to fail outside what it
thinks is a syntax error. If the case, this would make R_ParseVector()
function well when called from say, a C-extension to an R package, but
the way I am seeing it fail when called from an embedded R.
Yes, contexts are used internally to handle errors. For external use
please see Writing R Extensions, section 6.12.
I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and
is seems to help me overcome the issue. Thanks for the pointer.
Best,
Laurent
Best,
Tomas
We are otherwise aware that the error is not occurring in the R
but can be traced to a call to R_ParseVector() in R's C API:(
).
Our specific setup is calling an embedded R from Python, using the cffi
library. An error on end was the first possibility considered, but the
puzzling specificity of the error (as shown below other parsing errors
handled properly) and the difficulty tracing what is in happening in
R_ParseVector() made me ask whether someone on this list had a
about the possible issue"
```
import rpy2.rinterface as ri>>> ri.initr()>>> e =
ri.parse("list(''=1+")
---------------------------------------------------------------------------RParsingError
Traceback (most recent call last)>>> e =
ri.parse("list(''=123") R[write to console]: Error: attempt to use
zero-length variable name
R[write to console]: Fatal error: unable to initialize the JIT
*** stack smashing detected ***: <unknown> terminated
```
Le lun. 2 d?c. 2019 ? 06:37, Tomas Kalibera <tomas.kalibera at gmail.com>
Dear Laurent,
could you please provide a complete reproducible example where parsing
results in a crash of R? Calling parse(text="list(''=123") from R
fine for me (gives Error: attempt to use zero-length variable name).
I don't think the problem you observed could be related to the memory
leak. The leak is on the heap, not stack.
Zero-length names of elements in a list are allowed. They are not the
same thing as zero-length variables in an environment. If you try to
convert "lst" from your example to an environment, you would get the
error (attempt to use zero-length variable name).
Best
Tomas
On 11/30/19 11:55 PM, Laurent Gautier wrote:
Hi again,
Beside R_ParseVector()'s possible inconsistent behavior, R's handling
zero-length named elements does not seem consistent either:
```
lst <- list()
lst[[""]] <- 1
names(lst)
Error: attempt to use zero-length variable name
```
Should the parser be made to accept as valid what is otherwise
when using `[[<` ?
Best,
Laurent
Le sam. 30 nov. 2019 ? 17:33, Laurent Gautier <lgautier at gmail.com> a
I found the following code comment in `src/main/gram.c`:
```
/* Memory leak
yyparse(), as generated by bison, allocates extra space for the
stack using malloc(). Unfortunately this means that there is a
leak in case of an R error (long-jump). In principle, we could
yyoverflow() to relocate the parser stacks for bison and allocate
the R heap, but yyoverflow() is undocumented and somewhat
(we would have to replicate some macros from the generated parser
The same problem exists at least in the Rd and LaTeX parsers in
*/
```
Could this be related to be issue ?
Le sam. 30 nov. 2019 ? 14:04, Laurent Gautier <lgautier at gmail.com>
Hi,
The behavior of
```
SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
```
defined in `src/include/R_ext/Parse.h` appears to be inconsistent
depending on the string to be parsed.
Trying to parse a string such as `"list(''=1+"` sets the
`ParseStatus` to incomplete parsing error but trying to parse
`"list(''=123"` will result in R sending a message to the console
```
R[write to console]: Error: attempt to use zero-length variable
nameR[write to console]: Fatal error: unable to initialize the JIT***
smashing detected ***: <unknown> terminated
```
Is there a reason for the difference in behavior, and is there a
[[alternative HTML version deleted]]