Use of C++ in Packages
On 4/24/19 6:41 PM, Hugh Marera wrote:
Some of us are learning about development in R and use R in our work data analysis pipelines. What is the best way to identify packages that currently have these C++ problems? I would like to be able to help fix the bugs but more importantly not use these packages in critical work pipelines. Any C++ R package bug squashing events out there?
I think the best way available now is manual inspection/review of the source code of the packages you are using for your critical work. Such review should cover more than just dangerous use of C++ - a lot of problems exist also in plain C code (using unexported API from R, violating value semantics of R, other kinds of PROTECT errors, memory leaks due to long jumps, etc). The review could be limited to the context of your pipeline, on how the package is used there and whether you have a reliable external process for validating the results. Out of the problems I've mentioned in my blog, the worst for normal use of packages is probably a PROTECT error on the fast path due to allocation in a destructor or other function run automatically. Various memory leaks or correctness problems on error paths (long jumps) may not be a complete showstopper if you restart R often and if you have a reliable way of validating results, but such issues would still make it much harder to diagnose problems. The simple steps may include looking at CRAN check results, if there were any errors, warnings, notes, reports from analyzers (valgrind, asan, ubsan, rchk). The analyzers _may_ be able to spot a PROTECT error due to allocation in a destructor if one is lucky (in the case I mentioned in the blog, there was an ASAN report), but I think manual inspection is needed, and it can also reveal other problems. Tomas
Regards
Hugh
On Mon, Apr 1, 2019 at 6:23 PM Tomas Kalibera
<tomas.kalibera at gmail.com <mailto:tomas.kalibera at gmail.com>> wrote:
On 3/30/19 8:59 AM, Romain Francois wrote:
> tl;dr: we need better C++ tools and documentation.
>
> We collectively know more now with the rise of tools like rchk
and improved documentation such as Tomas?s post. That?s a start,
but it appears that there still is a lot of knowledge that would
deserve to be promoted to actual documentation of best practices.
Well there is quite a bit of knowledge in Writing R Extensions and
many
problems could have been prevented had it been read more
thoroughly by
package developers. The problem that C++ runs some functions
automatically (like destructors), should not be too hard to identify
based on what WRE says about the need for protection against garbage
collection.
?From my experience, one can learn most about R internals from
debugging
and reading source code - when debugging PROTECT errors and other
memory
errors/memory corruption, common problems caused by bugs in native
C/C++
code - one needs to read and understand source code involved at all
layers, one needs to understand the documentation covering code at
different layers, and one has to think about these things, forming
hypotheses, narrowing down to smaller examples, etc.
My suggestion for package authors who write native code and want to
learn more, and who want to be responsible (these kinds of bugs
affect
other packaged indirectly and can be woken up by inconsequential and
correct code changes, even in R runtime): test and debug your code
hard
- look at UBSAN/ASAN/valgrind/rchk checks from CRAN and run these
tools
yourself if needed. Run with strict barrier checking and with
gctorture.
Write more tests to increase the coverage. Specifically now if you
use
C++ code, try to read all of your related code and check you do
not have
the problems I mentioned in my blog. Think of other related
problems and
if you find about them, tell others. Make sure you only use the
API from
Writing R Extensions (and R help system). If you really can't find
anything wrong about your package, but still want to learn more,
try to
debug some bugs reported against R runtime or against your favorite
packages you use (or their CRAN check reports from various tools). In
addition to learning more about R internals, by spending much more
time
on debugging you may also get a different perspective on some of the
things about C++ I pointed to. Finally, it would help us with the
problem we have now - that many R packages in C++ have serious bugs.
Tomas
______________________________________________
R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel