Skip to content

Inconsistent behavior for the C AP's R_ParseVector() ?

10 messages · Tomas Kalibera, @osp@m m@iii@g oii @itieid-im@de, Laurent Gautier +1 more

#
Dear Laurent,

could you please provide a complete reproducible example where parsing 
results in a crash of R? Calling parse(text="list(''=123") from R works 
fine for me (gives Error: attempt to use zero-length variable name).

I don't think the problem you observed could be related to the memory 
leak. The leak is on the heap, not stack.

Zero-length names of elements in a list are allowed. They are not the 
same thing as zero-length variables in an environment. If you try to 
convert "lst" from your example to an environment, you would get the 
error (attempt to use zero-length variable name).

Best
Tomas
On 11/30/19 11:55 PM, Laurent Gautier wrote:
5 days later
#
Thanks for the quick response Tomas.

The same error is indeed happening when trying to have a zero-length
variable name in an environment. The surprising bit is then "why is this
happening during parsing" (that is why are variables assigned to an
environment) ?

We are otherwise aware that the error is not occurring in the R console,
but can be traced to a call to R_ParseVector() in R's C API:(
https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509
).

Our specific setup is calling an embedded R from Python, using the cffi
library. An error on end was the first possibility considered, but the
puzzling specificity of the error (as shown below other parsing errors are
handled properly) and the difficulty tracing what is in happening in
R_ParseVector() made me ask whether someone on this list had a suggestion
about the possible issue"

```
R[write to console]: Fatal error: unable to initialize the JIT

*** stack smashing detected ***: <unknown> terminated
```


Le lun. 2 d?c. 2019 ? 06:37, Tomas Kalibera <tomas.kalibera at gmail.com> a
?crit :

  
  
1 day later
#
On 12/7/19 10:32 PM, Laurent Gautier wrote:
The emitted R error (in the R console) is not a parse (syntax) error, 
but an error emitted during parsing when the parser tries to intern a 
name - look it up in a symbol table. Empty string is not allowed as a 
symbol name, and hence the error. In the call "list(''=1)" , the empty 
name is what could eventually become a name of a local variable inside 
list(), even though not yet during parsing.

There is probably some error in how the external code is handling R 
errors? (Fatal error: unable to initialize the JIT, stack smashing, etc) 
and possibly also how R is initialized before calling ParseVector. 
Probably you would get the same problem when running say 
"stop('myerror')". Please note R errors are implemented as long-jumps, 
so care has to be taken when calling into R, Writing R Extensions has 
more details (and section 8 specifically about embedding R). This is 
unlike parse (syntax) errors signaled via return value to ParseVector()

Best,
Tomas

  
  
#
Le lun. 9 d?c. 2019 ? 05:43, Tomas Kalibera <tomas.kalibera at gmail.com> a
?crit :
Thanks Tomas.

I guess this has do with R expressions being lazily evaluated, and names of
arguments in a call are also part of the expression. Now the puzzling part
is why is that at all part of the parsing: I would have expected
R_ParseVector() to be restricted to parsing... Now it feels like
R_ParseVector() is performing parsing, and a first level of evalution for
expressions that "should never work" (the empty name).

There is probably some error in how the external code is handling R errors
The issue is that the segfault (because of stack smashing, therefore
because of what also suspected to be an incontrolled jump) is happening
within the execution of R_ParseVector(). I would think that an issue with
the initialization of R is less likely because the project is otherwise
used a fair bit and is well covered by automated continuous tests.

After looking more into R's gram.c I suspect that an execution context is
required for R_ParseVector() to know to properly work (know where to jump
in case of error) when the parsing code decides to fail outside what it
thinks is a syntax error. If the case, this would make R_ParseVector()
function well when called from say, a C-extension to an R package, but fail
the way I am seeing it fail when called from an embedded R.

Best,

Laurent

  
  
#
On 12/9/19 2:54 PM, Laurent Gautier wrote:
Think of it as an exception in say Python. Some failures during parsing 
result in an exception (called error in R and implemented using a long 
jump). Any time you are calling into R you can get an error; out of 
memory is also signalled as R error.
Yes, contexts are used internally to handle errors. For external use 
please see Writing R Extensions, section 6.12.

Best
Tomas

  
  
2 days later
#
I am developing a package to improve the debugging of Rcpp (C++) and SEXP based C code in gdb
by providing convenience print, subset and other functions:

https://github.com/aryoda/R_CppDebugHelper

I also want to solve the Windows-only problem that you can break into the debugger from R
only via Rgui.exe (menu "Misc > break to debugger") by supporting breakpoints for R.exe.

I want breakpoints support in R.exe because debugging in Rgui.exe has an unwanted side effect:

https://stackoverflow.com/questions/59236579/gdb-prints-output-stdout-to-rgui-console-instead-of-gdb-console-on-windows-whe

My idea is to break into the debugger from R.exe by calling a little C(++) code that contains an INT 3 (opcode 0xCC) SIGTRAP code:

// break_to_debugger.cpp
// [[Rcpp::export]]
int break_to_debugger()
{
  int a = 3;
  asm("int $3");  // this code line shall break into the debugger
  // Idea taken from "Rgui > break into debugger":
  // https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/gnuwin32/rui.c#L431
  a++;
  return a;
}

# breakpoint.R
#' breaks the execution into the debugger
#'
#' @return
#' @export
breakpoint <- function() {
  break_to_debugger()
}

Surprisingly this works not only on Linux but also on Windows (v10, x64 architecture = 64 bit) in Rterm.exe,
but NOT for R.exe (64 bit):

- Rgui.exe:    Works
- Rscript.exe: Works
- R.exe:       Does not work: R.exe is exited with:
               [Inferior 1 (process 20704) exited with code 020000000003]

Can you please help me to understand why it works for Rgui.exe and Rscript.exe but not for R.exe?

Why is int 3 exiting R.exe?

And: How could I make it also work with R.exe?

Thanks a lot for sharing your ideas and experiences!

J?rgen

PS 1: My sessionInfo():
        R version 3.6.1 (2019-07-05)
        Platform: x86_64-w64-mingw32/x64 (64-bit)
        Running under: Windows 10 x64 (build 17134)

PS 2: My package "CppDebugHelper" was compiled with -g -o0 -std=c++11

PS 3: Here is my captured gdb output for the three test cases:

1. Rgui.exe ------------------------------------------------------------------------
Reading symbols from Rgui.exe...(no debugging symbols found)...done.
(gdb) run
Starting program: C:\R\bin\x64\Rgui.exe --silent --vanilla
[New Thread 14476.0x3710]
[New Thread 14476.0x284c]
[New Thread 14476.0x50ec]
[New Thread 14476.0x2d24]
warning: Invalid parameter passed to C runtime function.
[In RGui's R console:]
library(CppDebugHelper)
breakpoint()
[in gdb again:]
Program received signal SIGTRAP, Trace/breakpoint trap.
break_to_debugger () at break_to_debugger.cpp:33
33        a++;
(gdb) b debug_example_rcpp
Breakpoint 1 at 0x66ac6846: file debug_example_rcpp.cpp, line 13.
(gdb) continue
Continuing.
[In RGui's R console:]
debug_example_rcpp()
[in gdb again:]
Breakpoint 1, debug_example_rcpp () at debug_example_rcpp.cpp:13
13          CharacterVector cv   = CharacterVector::create("foo", "bar", NA_STRING, "hello")  ;
(gdb) next
14          NumericVector nv     = NumericVector::create(0.0, 1.0, NA_REAL, 10) ;
(gdb) n
16          DateVector dv        = DateVector::create( 14974, 14975, 15123, NA_REAL); // TODO how to use real dates instead?
(gdb) n
17          DateVector dv2       = DateVector::create(Date("2010-12-31"), Date("01.01.2011", "%d.%m.%Y"), Date(2011, 05, 29),
NA_REAL);
(gdb) n
18          DatetimeVector dtv   = DatetimeVector::create(1293753600, Datetime("2011-01-01"), Datetime("2011-05-29 10:15:30")
, NA_REAL);
(gdb) n
19          DataFrame df         = DataFrame::create(Named("name1") = cv, _["value1"] = nv, _["dv2"] = dv2);  // Named and _[
] are the same
(gdb) n
20          CharacterVector col1 = df["name1"];          // get the first column
(gdb) call dbg_print(df)
(gdb) call dbg_str(df)
(gdb) continue
Continuing.

[Output for the dbg_* function calls is printed to Rgui's R console (NOT the gdb terminal!):]

  name1 value1        dv2
1   foo      0 2010-12-31
2   bar      1 2011-01-01
3  <NA>     NA 2011-05-29
4 hello     10       <NA>

'data.frame':   4 obs. of  3 variables:
$ name1 : Factor w/ 3 levels "bar","foo","hello": 2 1 NA 3
$ value1: num  0 1 NA 10
$ dv2   : Date, format: "2010-12-31" "2011-01-01" ...



2. R.exe ------------------------------------------------------------------------
Reading symbols from R.exe...(no debugging symbols found)...done.
(gdb) r
Starting program: C:\R\bin\x64\R.exe --silent --vanilla
[New Thread 20704.0x2b20]
[New Thread 20704.0x4c08]
[New Thread 20704.0x425c]
[New Thread 20704.0x45f8]
[Thread 20704.0x45f8 exited with code 2147483651]
[Thread 20704.0x425c exited with code 2147483651]
[Thread 20704.0x4c08 exited with code 2147483651]
[Inferior 1 (process 20704) exited with code 020000000003]
(gdb) bt
No stack.
(gdb)



3. Rterm.exe ------------------------------------------------------------------------

gdb --quiet --args Rterm.exe --silent --vanilla
Reading symbols from Rterm.exe...(no debugging symbols found)...done.
(gdb) run
Starting program: C:\R\bin\x64\Rterm.exe --silent --vanilla
[New Thread 8132.0x3ee8]
[New Thread 8132.0x3828]
[New Thread 8132.0x4f1c]
[New Thread 8132.0x4ff4]
warning: Invalid parameter passed to C runtime function.
[New Thread 8132.0x4dc8]
Program received signal SIGTRAP, Trace/breakpoint trap.
break_to_debugger () at break_to_debugger.cpp:33
33        a++;
(gdb) b debug_example_rcpp
Breakpoint 1 at 0x66ac6846: file debug_example_rcpp.cpp, line 13.
(gdb) c
Continuing.
[1] 4
Breakpoint 1, debug_example_rcpp () at debug_example_rcpp.cpp:13
13          CharacterVector cv   = CharacterVector::create("foo", "bar", NA_STRING, "hello")  ;
(gdb) n
14          NumericVector nv     = NumericVector::create(0.0, 1.0, NA_REAL, 10) ;
(gdb) n
16          DateVector dv        = DateVector::create( 14974, 14975, 15123, NA_REAL); // TODO how to use real dates instead?
(gdb) call dbg_print(nv)
[1]  0  1 NA 10
(gdb) call dbg_print(dbg_subset(nv, 1, 2))
[1]  1 NA
(gdb)
2 days later
#
Le lun. 9 d?c. 2019 ? 09:57, Tomas Kalibera <tomas.kalibera at gmail.com> a
?crit :
The surprising bit for me was that I had expected the function to solely
perform parsing. I did expect an exception (and a jmp smashing the stack)
when the function concerned is in the C-API, is parsing a string, and is
using a parameter (pointer) to store whether parsing was a failure or a
success.

Since you are making a comparison with Python, the distinction I am making
between parsing and evaluation seem to apply there. For example:

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1
    1+
     ^
SyntaxError: unexpected EOF while parsing
<parser.st at 0x7f360e5329f0>
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: eval() arg 1 must be a string, bytes or code object
File "<stdin>", line 1
SyntaxError: keyword can't be an expression
```
I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and this
is seems to help me overcome the issue. Thanks for the pointer.

Best,


Laurent

  
  
#
Laurent,

the main point here is that ParseVector() just like any other R API has to be called in a correct context since it can raise errors so the issue was that your C code has a bug of not setting R correctly (my guess would be your'e not creating the initial context necessary in embedded R). There are many different errors, your is just one of many that can occur - any R API call that does allocation (and parsing obviously does) can cause errors. Note that this is true for pretty much all R API functions.

Cheers,
Simon
#
Hi Simon,

Widespread errors would have caught my earlier as the way that code is
using only one initialization of the embedded R, is used quite a bit, and
is covered by quite a few unit tests. This is the only situation I am aware
of in which an error occurs.

What is a "correct context", or initial context, the code should from ?
Searching for "context" in the R-exts manual does not return much.

Best,

Laurent


Le sam. 14 d?c. 2019 ? 12:20, Simon Urbanek <simon.urbanek at r-project.org> a
?crit :

  
  
#
Laurent,
It may or may not be "widespread" - almost all R API functions can raise errors (e.g., unable to allocate). You'll only find out once they do and that's too late ;).
It depends which embedded API use - see R-ext 8.1 the two options are run_Rmainloop() and R_ReplDLLinit() which both setup the top-level context with SETJMP. If you don't use either then you have to use one of the advanced R APIs that do it such as R_ToplevelExec() or R_UnwindProtect(), otherwise your point to abort to on error doesn't exist. Embedding R is much more complex than many think ...

Cheers,
Simon