I am seeing a curious error in an ASAN package check which is not reproducible in the r-debug containers (https://github.com/wch/r-debug), and which I'm suspecting might be a compiler bug. Wanted to ask for a second opinion on whether this could actually be a compiler bug or a real bug in the package, and if the former, how should I reply to the email from CRAN about fixing the issues from the checks in the package. In more detail, in the ASAN logs here: https://cran.r-project.org/web/checks/check_results_isotree.html It mentions detecting a global-buffer-overflow during a read, which happens in the contents of a string literal that's used as constructor for 'std::regex': https://github.com/david-cortes/isotree/blob/1f84128a03bb6fc5eecd1de7aebf4b745b54fa1e/src/formatted_exporters.cpp#L332C13-L332C31 std::regex_replace(s, std::regex("\""), "\\\"") I'm not understanding how it could possibly cause an overflow from either constructing an 'std::regex' with a string literal, or from passing the result of it to 'std::regex_replace' with C++ strings - it looks like it should be an impossible situation. The values for argument 's' (the 'std::string' where to make replacements in the 'std::regex_replace' call which receives the 'std::regex' object) which are seen during the example that gets flagged by ASAN do not have anything special - their contents are one of the following: "column_1", "column_2", "column_3" - and they are all obtained from a call to 'Rcpp::as<std::vector<std::string>>' on an R character vector, which should rule out issues with e.g. missing null termination, wrong size, and similar: https://github.com/david-cortes/isotree/blob/1f84128a03bb6fc5eecd1de7aebf4b745b54fa1e/src/Rwrapper.cpp#L1975 Help here would be appreciated. Best, David Cortes
[R-pkg-devel] Non-reproducible ASAN flagged issue
4 messages · David Cortes, Tomas Kalibera, Ivan Krylov
On 12/17/24 20:26, David Cortes wrote:
I am seeing a curious error in an ASAN package check which is not reproducible in the r-debug containers (https://github.com/wch/r-debug), and which I'm suspecting might be a compiler bug. Wanted to ask for a second opinion on whether this could actually be a compiler bug or a real bug in the package, and if the former, how should I reply to the email from CRAN about fixing the issues from the checks in the package.
In principle, if you run into something you suspect to be a bug in the compiler (or perhaps rather C++ library), the best course of action is to narrow it down to a minimal reproducible example (no R, no R package, just tiny C/C++ standalone code). And then report that to the compiler/C++ library developers. With such a minimal example, I would expect one would get a quick response from the developers on whether it is actually a bug or the code is wrong. Also, on the way of narrowing down, one might figure out the problem (if in the package). And if confirmed compiler/C++ library/etc bug by the corresponding developers, then I would let the CRAN team know. Best Tomas
In more detail, in the ASAN logs here: https://cran.r-project.org/web/checks/check_results_isotree.html It mentions detecting a global-buffer-overflow during a read, which happens in the contents of a string literal that's used as constructor for 'std::regex': https://github.com/david-cortes/isotree/blob/1f84128a03bb6fc5eecd1de7aebf4b745b54fa1e/src/formatted_exporters.cpp#L332C13-L332C31 std::regex_replace(s, std::regex("\""), "\\\"") I'm not understanding how it could possibly cause an overflow from either constructing an 'std::regex' with a string literal, or from passing the result of it to 'std::regex_replace' with C++ strings - it looks like it should be an impossible situation. The values for argument 's' (the 'std::string' where to make replacements in the 'std::regex_replace' call which receives the 'std::regex' object) which are seen during the example that gets flagged by ASAN do not have anything special - their contents are one of the following: "column_1", "column_2", "column_3" - and they are all obtained from a call to 'Rcpp::as<std::vector<std::string>>' on an R character vector, which should rule out issues with e.g. missing null termination, wrong size, and similar: https://github.com/david-cortes/isotree/blob/1f84128a03bb6fc5eecd1de7aebf4b745b54fa1e/src/Rwrapper.cpp#L1975 Help here would be appreciated. Best, David Cortes
______________________________________________ R-package-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
? Tue, 17 Dec 2024 20:26:01 +0100 David Cortes <david.cortes.rivera at gmail.com> ?????:
I am seeing a curious error in an ASAN package check which is not reproducible in the r-debug containers (https://github.com/wch/r-debug), and which I'm suspecting might be a compiler bug.
r-debug differs from the gcc-ASAN special check in at least the
compiler version. The log at [1] says it's running with GCC 14.2.0,
while docker.io/wch1/r-debug uses GCC 12.3.0. Additionally, LTO was
recently enabled for R but not the packages [2].
The log says that the std::regex("\"") constructor somehow manages to
read a byte past the end (after the 0-terminator) of its C-style string
argument. While I wasn't able to reproduce it even after starting
with docker.io/rocker/drd and rebuilding R according to [2], with GCC
14 and LTO for R but not packages, the following much simpler example
does exhibit the same behaviour:
#include <iostream>
#include <regex>
int main() {
std::string s{" gjdshlkhj \" lsjkhkljh "};
const char * rx = "\"";
std::cout
<< std::regex_replace(s, std::regex(rx), "\\\"") // <-- line 7
<< std::endl;
// the code below is required for the problem to happen above!
for (int i = 0; i < 100; ++i) volatile std::regex rxx(rx);
}
g++-14 -flto=10 -o foo -g -O2 -mtune=native \
-fsanitize=address,undefined,bounds-strict foo.cpp && ./foo
==648==ERROR: AddressSanitizer: global-buffer-overflow on address 0x556ed780fa02 at pc 0x556ed7731520 bp 0x7fff41781420 sp 0x7fff41781410
READ of size 1 at 0x556ed780fa02 thread T0
#0 0x556ed773151f in std::__detail::_Scanner<char>::_M_scan_normal() /usr/include/c++/14/bits/regex_scanner.tcc:98
#1 0x556ed773151f in std::__detail::_Scanner<char>::_M_advance() /usr/include/c++/14/bits/regex_scanner.tcc:79
#2 0x556ed7734416 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_match_token(std::__detail::_ScannerBase::_TokenT) /usr/include/c++/14/bits/regex_compiler.tcc:575
#3 0x556ed7748374 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_atom() /usr/include/c++/14/bits/regex_compiler.tcc:310
#4 0x556ed7748374 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term() /usr/include/c++/14/bits/regex_compiler.tcc:133
#5 0x556ed7748374 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() /usr/include/c++/14/bits/regex_compiler.tcc:115
#6 0x556ed7747428 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() /usr/include/c++/14/bits/regex_compiler.tcc:118
#7 0x556ed7753285 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction() /usr/include/c++/14/bits/regex_compiler.tcc:91
#8 0x556ed77df36e in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char const*, char const*, std::locale const&, std::regex_constants::syntax_option_type) /usr/include/c++/14/bits/regex_compiler.tcc:76
#9 0x556ed77df36e in std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::_M_compile(char const*, char const*, std::regex_constants::syntax_option_type) [clone .constprop.0] /usr/include/c++/14/bits/regex.h:809
#10 0x556ed771b8cf in std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::basic_regex(char const*, std::regex_constants::syntax_option_type) /usr/include/c++/14/bits/regex.h:473
#11 0x556ed771b8cf in main foo.cpp:7 // <-- see line 7 above
0x556ed780fa02 is located 0 bytes after global variable '*.LC45' defined in './foo.ltrans3.ltrans' (0x556ed780fa00) of size 2
'*.LC45' is ascii string '"'
Disabling LTO or removing that loop that constructs additional regexps
makes the error vanish.
At the time of the error, members _M_current and _M_end (which should
point at the current part of the string and past the end of the same
string, respectively) point at completely different strings with the
same content:
(gdb) p _M_current-3
$12 = 0x55ce00bb4a00 "\""
(gdb) p _M_end-1
$13 = 0x55ce00b956a0 "\""
(gdb) p _M_end - _M_current
$14 = -127842
(gdb) p _M_current-4
$15 = 0x55ce00bb49ff ""
(gdb) p _M_end-2
$16 = 0x55ce00b9569f ""
Did the gcc-ASAN check enable LTO for packages too, not only R itself?
For a quick workaround, I can only recommend writing your own
replace_all() function using the std::string::find and
std::string::replace methods in a loop. Thankfully, you replace plain
strings, not complicated regular expressions.
Thank you, that was very helpful indeed! I've filed a bug report with GCC just in case: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118113 Best regads, David Cortes
On Wed, 2024-12-18 at 17:34 +0300, Ivan Krylov wrote:
? Tue, 17 Dec 2024 20:26:01 +0100 David Cortes <david.cortes.rivera at gmail.com> ?????:
I am seeing a curious error in an ASAN package check which is not reproducible in the r-debug containers (https://github.com/wch/r-debug), and which I'm suspecting might be a compiler bug.
r-debug differs from the gcc-ASAN special check in at least the
compiler version. The log at [1] says it's running with GCC 14.2.0,
while docker.io/wch1/r-debug uses GCC 12.3.0. Additionally, LTO was
recently enabled for R but not the packages [2].
The log says that the std::regex("\"") constructor somehow manages to
read a byte past the end (after the 0-terminator) of its C-style
string
argument. While I wasn't able to reproduce it even after starting
with docker.io/rocker/drd and rebuilding R according to [2], with GCC
14 and LTO for R but not packages, the following much simpler example
does exhibit the same behaviour:
#include <iostream>
#include <regex>
int main() {
?std::string s{" gjdshlkhj \" lsjkhkljh "};
?const char * rx = "\"";
?std::cout
? << std::regex_replace(s, std::regex(rx), "\\\"") // <-- line 7
? << std::endl;
?// the code below is required for the problem to happen above!
?for (int i = 0; i < 100; ++i) volatile std::regex rxx(rx);
}
g++-14 -flto=10 -o foo -g -O2 -mtune=native \
-fsanitize=address,undefined,bounds-strict foo.cpp && ./foo
==648==ERROR: AddressSanitizer: global-buffer-overflow on address
0x556ed780fa02 at pc 0x556ed7731520 bp 0x7fff41781420 sp
0x7fff41781410
READ of size 1 at 0x556ed780fa02 thread T0
??? #0 0x556ed773151f in
std::__detail::_Scanner<char>::_M_scan_normal()
/usr/include/c++/14/bits/regex_scanner.tcc:98
??? #1 0x556ed773151f in std::__detail::_Scanner<char>::_M_advance()
/usr/include/c++/14/bits/regex_scanner.tcc:79
??? #2 0x556ed7734416 in
std::__detail::_Compiler<std::__cxx11::regex_traits<char>
::_M_match_token(std::__detail::_ScannerBase::_TokenT)
/usr/include/c++/14/bits/regex_compiler.tcc:575 ??? #3 0x556ed7748374 in std::__detail::_Compiler<std::__cxx11::regex_traits<char>
::_M_atom() /usr/include/c++/14/bits/regex_compiler.tcc:310
??? #4 0x556ed7748374 in std::__detail::_Compiler<std::__cxx11::regex_traits<char>
::_M_term() /usr/include/c++/14/bits/regex_compiler.tcc:133
??? #5 0x556ed7748374 in std::__detail::_Compiler<std::__cxx11::regex_traits<char>
::_M_alternative() /usr/include/c++/14/bits/regex_compiler.tcc:115
??? #6 0x556ed7747428 in std::__detail::_Compiler<std::__cxx11::regex_traits<char>
::_M_alternative() /usr/include/c++/14/bits/regex_compiler.tcc:118
??? #7 0x556ed7753285 in std::__detail::_Compiler<std::__cxx11::regex_traits<char>
::_M_disjunction() /usr/include/c++/14/bits/regex_compiler.tcc:91
??? #8 0x556ed77df36e in std::__detail::_Compiler<std::__cxx11::regex_traits<char>
::_Compiler(char const*, char const*, std::locale const&,
std::regex_constants::syntax_option_type) /usr/include/c++/14/bits/regex_compiler.tcc:76 ??? #9 0x556ed77df36e in std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::_M_compile(char const*, char const*, std::regex_constants::syntax_option_type) [clone .constprop.0] /usr/include/c++/14/bits/regex.h:809 ??? #10 0x556ed771b8cf in std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::basic_regex(char const*, std::regex_constants::syntax_option_type) /usr/include/c++/14/bits/regex.h:473 ??? #11 0x556ed771b8cf in main foo.cpp:7 // <-- see line 7 above 0x556ed780fa02 is located 0 bytes after global variable '*.LC45' defined in './foo.ltrans3.ltrans' (0x556ed780fa00) of size 2 ? '*.LC45' is ascii string '"' Disabling LTO or removing that loop that constructs additional regexps makes the error vanish. At the time of the error, members _M_current and _M_end (which should point at the current part of the string and past the end of the same string, respectively) point at completely different strings with the same content: (gdb) p _M_current-3 $12 = 0x55ce00bb4a00 "\"" (gdb) p _M_end-1 $13 = 0x55ce00b956a0 "\"" (gdb) p _M_end - _M_current $14 = -127842 (gdb) p _M_current-4 $15 = 0x55ce00bb49ff "" (gdb) p _M_end-2 $16 = 0x55ce00b9569f "" Did the gcc-ASAN check enable LTO for packages too, not only R itself? For a quick workaround, I can only recommend writing your own replace_all() function using the std::string::find and std::string::replace methods in a loop. Thankfully, you replace plain strings, not complicated regular expressions.