A change to R-devel (SVN r86629 or https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250 has changed the handling of pointers to zero-length objects, leading to ASAN issues with a number of Rcpp-based packages (the commit message reads, in part, "Also define STRICT_TYPECHECK when compiling inlined.c.") I'm interested in discussion from the community. Details/diagnosis for the issues in the lme4 package are here: https://github.com/lme4/lme4/issues/794, with a bit more discussion about how zero-length objects should be handled. The short(ish) version is that r86629 enables the CATCH_ZERO_LENGTH_ACCESS definition. This turns on the CHKZLN macro <https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104>, which returns a trivial pointer (rather than the data pointer that would be returned in the normal control flow) if an object has length 0: /* Attempts to read or write elements of a zero length vector will result in a segfault, rather than read and write random memory. Returning NULL would be more natural, but Matrix seems to assume that even zero-length vectors have non-NULL data pointers, so return (void *) 1 instead. Zero-length CHARSXP objects still have a trailing zero byte so they are not handled. */ In the Rcpp context this leads to an inconsistency, where `REAL(x)` is a 'real' external pointer and `DATAPTR(x)` is 0x1, which in turn leads to ASAN warnings like runtime error: reference binding to misaligned address 0x000000000001 for type 'const double', which requires 8 byte alignment 0x000000000001: note: pointer points here I'm in over my head and hoping for insight into whether this problem should be resolved by changing R, Rcpp, or downstream Rcpp packages ... cheers Ben Bolker
changes in R-devel and zero-extent objects in Rcpp
6 messages · Kevin Ushey, Ben Bolker, iuke-tier@ey m@iii@g oii uiow@@edu +1 more
IMHO, this should be changed in both Rcpp and downstream packages: 1. Rcpp could check for out-of-bounds accesses in cases like these, and emit an R warning / error when such an access is detected; 2. The downstream packages unintentionally making these out-of-bounds accesses should be fixed to avoid doing that. That is, I think this is ultimately a bug in the affected packages, but Rcpp could do better in detecting and handling this for client packages (avoiding a segfault). Best, Kevin
On Sat, Jun 8, 2024, 3:06?PM Ben Bolker <bbolker at gmail.com> wrote:
A change to R-devel (SVN r86629 or
https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250
has changed the handling of pointers to zero-length objects, leading to
ASAN issues with a number of Rcpp-based packages (the commit message
reads, in part, "Also define STRICT_TYPECHECK when compiling inlined.c.")
I'm interested in discussion from the community.
Details/diagnosis for the issues in the lme4 package are here:
https://github.com/lme4/lme4/issues/794, with a bit more discussion
about how zero-length objects should be handled.
The short(ish) version is that r86629 enables the
CATCH_ZERO_LENGTH_ACCESS definition. This turns on the CHKZLN macro
<
https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104>,
which returns a trivial pointer (rather than the data pointer that would
be returned in the normal control flow) if an object has length 0:
/* Attempts to read or write elements of a zero length vector will
result in a segfault, rather than read and write random memory.
Returning NULL would be more natural, but Matrix seems to assume
that even zero-length vectors have non-NULL data pointers, so
return (void *) 1 instead. Zero-length CHARSXP objects still have a
trailing zero byte so they are not handled. */
In the Rcpp context this leads to an inconsistency, where `REAL(x)`
is a 'real' external pointer and `DATAPTR(x)` is 0x1, which in turn
leads to ASAN warnings like
runtime error: reference binding to misaligned address 0x000000000001
for type 'const double', which requires 8 byte alignment
0x000000000001: note: pointer points here
I'm in over my head and hoping for insight into whether this problem
should be resolved by changing R, Rcpp, or downstream Rcpp packages ...
cheers
Ben Bolker
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
The ASAN errors occur *even if the zero-length object is not actually
accessed*/is used in a perfectly correct manner, i.e. it's perfectly
legal in base R to define `m <- numeric(0)` or `m <- matrix(nrow = 0,
ncol = 0)`, whereas doing the equivalent in Rcpp will (now) lead to an
ASAN error.
i.e., these are *not* previously cryptic out-of-bounds accesses that
are now being revealed, but instead sensible and previously legal
definitions of zero-length objects that are now causing problems.
I'm pretty sure I'm right about this, but it's absolutely possible
that I'm just confused at this point; I don't have a super-simple
example to show you at the moment. The closest is this example by Mikael
Jagan: https://github.com/lme4/lme4/issues/794#issuecomment-2155093049
which shows that if x is a pointer to a zero-length vector (in plain
C++ for R, no Rcpp is involved), DATAPTR(x) and REAL(x) evaluate to
different values.
Mikael further points out that "Rcpp seems to cast a (void *)
returned by DATAPTR to (double *) when constructing a Vector<REALSXP>
from a SEXP, rather than using the (double *) returned by REAL." So
perhaps R-core doesn't want to guarantee that these operations give
identical answers, in which case Rcpp will have to change the way it
does things ...
cheers
Ben
On 2024-06-08 6:39 p.m., Kevin Ushey wrote:
IMHO, this should be changed in both Rcpp and downstream packages:
1. Rcpp could check for out-of-bounds accesses in cases like these, and
emit an R warning / error when such an access is detected;
2. The downstream packages unintentionally making these out-of-bounds
accesses should be fixed to avoid doing that.
That is, I think this is ultimately a bug in the affected packages, but
Rcpp could do better in detecting and handling this for client packages
(avoiding a segfault).
Best,
Kevin
On Sat, Jun 8, 2024, 3:06?PM Ben Bolker <bbolker at gmail.com
<mailto:bbolker at gmail.com>> wrote:
? ? A change to R-devel (SVN r86629 or
https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250 <https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250>
has changed the handling of pointers to zero-length objects, leading to
ASAN issues with a number of Rcpp-based packages (the commit message
reads, in part, "Also define STRICT_TYPECHECK when compiling
inlined.c.")
? ?I'm interested in discussion from the community.
? ?Details/diagnosis for the issues in the lme4 package are here:
https://github.com/lme4/lme4/issues/794
<https://github.com/lme4/lme4/issues/794>, with a bit more discussion
about how zero-length objects should be handled.
? ?The short(ish) version is that r86629 enables the
CATCH_ZERO_LENGTH_ACCESS definition. This turns on the CHKZLN macro
<https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104 <https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104>>,
which returns a trivial pointer (rather than the data pointer that
would
be returned in the normal control flow) if an object has length 0:
/* Attempts to read or write elements of a zero length vector will
? ? result in a segfault, rather than read and write random memory.
? ? Returning NULL would be more natural, but Matrix seems to assume
? ? that even zero-length vectors have non-NULL data pointers, so
? ? return (void *) 1 instead. Zero-length CHARSXP objects still have a
? ? trailing zero byte so they are not handled. */
? ?In the Rcpp context this leads to an inconsistency, where `REAL(x)`
is a 'real' external pointer and `DATAPTR(x)` is 0x1, which in turn
leads to ASAN warnings like
runtime error: reference binding to misaligned address 0x000000000001
for type 'const double', which requires 8 byte alignment
0x000000000001: note: pointer points here
? ? I'm in over my head and hoping for insight into whether this
problem
should be resolved by changing R, Rcpp, or downstream Rcpp packages ...
? ?cheers
? ? Ben Bolker
______________________________________________
R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
<https://stat.ethz.ch/mailman/listinfo/r-devel>
Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Director, School of Computational Science and Engineering (Acting) Graduate chair, Mathematics & Statistics > E-mail is sent at my convenience; I don't expect replies outside of working hours.
On Sat, 8 Jun 2024, Ben Bolker wrote:
The ASAN errors occur *even if the zero-length object is not actually accessed*/is used in a perfectly correct manner, i.e. it's perfectly legal in base R to define `m <- numeric(0)` or `m <- matrix(nrow = 0, ncol = 0)`, whereas doing the equivalent in Rcpp will (now) lead to an ASAN error. i.e., these are *not* previously cryptic out-of-bounds accesses that are now being revealed, but instead sensible and previously legal definitions of zero-length objects that are now causing problems. I'm pretty sure I'm right about this, but it's absolutely possible that I'm just confused at this point; I don't have a super-simple example to show you at the moment. The closest is this example by Mikael Jagan: https://github.com/lme4/lme4/issues/794#issuecomment-2155093049 which shows that if x is a pointer to a zero-length vector (in plain C++ for R, no Rcpp is involved), DATAPTR(x) and REAL(x) evaluate to different values. Mikael further points out that "Rcpp seems to cast a (void *) returned by DATAPTR to (double *) when constructing a Vector<REALSXP> from a SEXP, rather than using the (double *) returned by REAL." So perhaps R-core doesn't want to guarantee that these operations give identical answers, in which case Rcpp will have to change the way it does things ...
It looks like REAL and friends should also get this check, but it's not high priority at this point, at least to me. DATAPTR has been using this check for a while in a barrier build, so you might want to test there as well. I expect we will activate more integrity checks from the barrier build on the API client side as things are tidied up. However: DATAPTR is not in the API and can't be at least in this form: It allows access to a writable pointer to STRSXP and VECSXP data and that is too dangerous for memory manager integrity. I'm not sure exactly how this will be resolve, but be prepared for changes. Best, luke
cheers Ben On 2024-06-08 6:39 p.m., Kevin Ushey wrote:
IMHO, this should be changed in both Rcpp and downstream packages:
1. Rcpp could check for out-of-bounds accesses in cases like these, and
emit an R warning / error when such an access is detected;
2. The downstream packages unintentionally making these out-of-bounds
accesses should be fixed to avoid doing that.
That is, I think this is ultimately a bug in the affected packages, but
Rcpp could do better in detecting and handling this for client packages
(avoiding a segfault).
Best,
Kevin
On Sat, Jun 8, 2024, 3:06?PM Ben Bolker <bbolker at gmail.com
<mailto:bbolker at gmail.com>> wrote:
? ? A change to R-devel (SVN r86629 or
https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250
<https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250>
has changed the handling of pointers to zero-length objects, leading to
ASAN issues with a number of Rcpp-based packages (the commit message
reads, in part, "Also define STRICT_TYPECHECK when compiling
inlined.c.")
? ?I'm interested in discussion from the community.
? ?Details/diagnosis for the issues in the lme4 package are here:
https://github.com/lme4/lme4/issues/794
<https://github.com/lme4/lme4/issues/794>,
with a bit more discussion
about how zero-length objects should be handled.
? ?The short(ish) version is that r86629 enables the
CATCH_ZERO_LENGTH_ACCESS definition. This turns on the CHKZLN macro
<https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104
<https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104>>,
which returns a trivial pointer (rather than the data pointer that
would
be returned in the normal control flow) if an object has length 0:
/* Attempts to read or write elements of a zero length vector will
? ? result in a segfault, rather than read and write random memory.
? ? Returning NULL would be more natural, but Matrix seems to assume
? ? that even zero-length vectors have non-NULL data pointers, so
? ? return (void *) 1 instead. Zero-length CHARSXP objects still have
a
? ? trailing zero byte so they are not handled. */
? ?In the Rcpp context this leads to an inconsistency, where `REAL(x)`
is a 'real' external pointer and `DATAPTR(x)` is 0x1, which in turn
leads to ASAN warnings like
runtime error: reference binding to misaligned address 0x000000000001
for type 'const double', which requires 8 byte alignment
0x000000000001: note: pointer points here
? ? I'm in over my head and hoping for insight into whether this
problem
should be resolved by changing R, Rcpp, or downstream Rcpp packages ...
? ?cheers
? ? Ben Bolker
______________________________________________
R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
<https://stat.ethz.ch/mailman/listinfo/r-devel>
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/
Sorry to ask about a bit drifted topic, but will there be an alternative API to DATAPTR?
DATAPTR is not in the API and can't be at least in this form
I believe it's vital for ALTREP to return the pointer to the expanded version of a SEXP just like the implementation in base R does [1]. At least, VECSXP has no other measure to expose the pointer if I understand correctly. Best, Yutani [1]: https://github.com/r-devel/r-svn/blob/a3508b75d28164b0e5bcb2c87f816ce5169729a4/src/main/altclasses.c#L186 2024?6?9?(?) 10:43 luke-tierney--- via R-devel <r-devel at r-project.org>:
On Sat, 8 Jun 2024, Ben Bolker wrote:
The ASAN errors occur *even if the zero-length object is not actually accessed*/is used in a perfectly correct manner, i.e. it's perfectly
legal in
base R to define `m <- numeric(0)` or `m <- matrix(nrow = 0, ncol = 0)`, whereas doing the equivalent in Rcpp will (now) lead to an ASAN error. i.e., these are *not* previously cryptic out-of-bounds accesses that
are
now being revealed, but instead sensible and previously legal
definitions of
zero-length objects that are now causing problems. I'm pretty sure I'm right about this, but it's absolutely possible
that
I'm just confused at this point; I don't have a super-simple example to
show
you at the moment. The closest is this example by Mikael Jagan: https://github.com/lme4/lme4/issues/794#issuecomment-2155093049 which shows that if x is a pointer to a zero-length vector (in plain
C++
for R, no Rcpp is involved), DATAPTR(x) and REAL(x) evaluate to
different
values. Mikael further points out that "Rcpp seems to cast a (void *) returned
by
DATAPTR to (double *) when constructing a Vector<REALSXP> from a SEXP,
rather
than using the (double *) returned by REAL." So perhaps R-core doesn't
want
to guarantee that these operations give identical answers, in which case
Rcpp
will have to change the way it does things ...
It looks like REAL and friends should also get this check, but it's not high priority at this point, at least to me. DATAPTR has been using this check for a while in a barrier build, so you might want to test there as well. I expect we will activate more integrity checks from the barrier build on the API client side as things are tidied up. However: DATAPTR is not in the API and can't be at least in this form: It allows access to a writable pointer to STRSXP and VECSXP data and that is too dangerous for memory manager integrity. I'm not sure exactly how this will be resolve, but be prepared for changes. Best, luke
cheers Ben On 2024-06-08 6:39 p.m., Kevin Ushey wrote:
IMHO, this should be changed in both Rcpp and downstream packages:
1. Rcpp could check for out-of-bounds accesses in cases like these, and
emit an R warning / error when such an access is detected;
2. The downstream packages unintentionally making these out-of-bounds
accesses should be fixed to avoid doing that.
That is, I think this is ultimately a bug in the affected packages, but
Rcpp could do better in detecting and handling this for client packages
(avoiding a segfault).
Best,
Kevin
On Sat, Jun 8, 2024, 3:06?PM Ben Bolker <bbolker at gmail.com
<mailto:bbolker at gmail.com>> wrote:
A change to R-devel (SVN r86629 or
<
has changed the handling of pointers to zero-length objects,
leading to
ASAN issues with a number of Rcpp-based packages (the commit message
reads, in part, "Also define STRICT_TYPECHECK when compiling
inlined.c.")
I'm interested in discussion from the community.
Details/diagnosis for the issues in the lme4 package are here:
https://github.com/lme4/lme4/issues/794
<https://github.com/lme4/lme4/issues/794>,
with a bit more discussion
about how zero-length objects should be handled.
The short(ish) version is that r86629 enables the
CATCH_ZERO_LENGTH_ACCESS definition. This turns on the CHKZLN macro
<
<
,
which returns a trivial pointer (rather than the data pointer that
would
be returned in the normal control flow) if an object has length 0:
/* Attempts to read or write elements of a zero length vector will
result in a segfault, rather than read and write random memory.
Returning NULL would be more natural, but Matrix seems to
assume
that even zero-length vectors have non-NULL data pointers, so
return (void *) 1 instead. Zero-length CHARSXP objects still
have
a
trailing zero byte so they are not handled. */
In the Rcpp context this leads to an inconsistency, where
`REAL(x)`
is a 'real' external pointer and `DATAPTR(x)` is 0x1, which in turn
leads to ASAN warnings like
runtime error: reference binding to misaligned address
0x000000000001
for type 'const double', which requires 8 byte alignment
0x000000000001: note: pointer points here
I'm in over my head and hoping for insight into whether this
problem
should be resolved by changing R, Rcpp, or downstream Rcpp packages
...
cheers
Ben Bolker
______________________________________________
R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
<https://stat.ethz.ch/mailman/listinfo/r-devel>
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
(Mainly reply to self) I found there's a new API VECTOR_PTR_RO. Thanks for adding this! https://github.com/r-devel/r-svn/commit/d499fab95b1ba23ee7842293030d4af1e69ae0fe Best, Yutani 2024?6?9?(?) 14:13 Hiroaki Yutani <yutani.ini at gmail.com>:
Sorry to ask about a bit drifted topic, but will there be an alternative API to DATAPTR?
DATAPTR is not in the API and can't be at least in this form
I believe it's vital for ALTREP to return the pointer to the expanded version of a SEXP just like the implementation in base R does [1]. At least, VECSXP has no other measure to expose the pointer if I understand correctly. Best, Yutani [1]: https://github.com/r-devel/r-svn/blob/a3508b75d28164b0e5bcb2c87f816ce5169729a4/src/main/altclasses.c#L186 2024?6?9?(?) 10:43 luke-tierney--- via R-devel <r-devel at r-project.org>:
On Sat, 8 Jun 2024, Ben Bolker wrote:
The ASAN errors occur *even if the zero-length object is not actually accessed*/is used in a perfectly correct manner, i.e. it's perfectly
legal in
base R to define `m <- numeric(0)` or `m <- matrix(nrow = 0, ncol =
0)`,
whereas doing the equivalent in Rcpp will (now) lead to an ASAN error. i.e., these are *not* previously cryptic out-of-bounds accesses that
are
now being revealed, but instead sensible and previously legal
definitions of
zero-length objects that are now causing problems. I'm pretty sure I'm right about this, but it's absolutely possible
that
I'm just confused at this point; I don't have a super-simple example to
show
you at the moment. The closest is this example by Mikael Jagan: https://github.com/lme4/lme4/issues/794#issuecomment-2155093049 which shows that if x is a pointer to a zero-length vector (in plain
C++
for R, no Rcpp is involved), DATAPTR(x) and REAL(x) evaluate to
different
values. Mikael further points out that "Rcpp seems to cast a (void *) returned
by
DATAPTR to (double *) when constructing a Vector<REALSXP> from a SEXP,
rather
than using the (double *) returned by REAL." So perhaps R-core doesn't
want
to guarantee that these operations give identical answers, in which
case Rcpp
will have to change the way it does things ...
It looks like REAL and friends should also get this check, but it's not high priority at this point, at least to me. DATAPTR has been using this check for a while in a barrier build, so you might want to test there as well. I expect we will activate more integrity checks from the barrier build on the API client side as things are tidied up. However: DATAPTR is not in the API and can't be at least in this form: It allows access to a writable pointer to STRSXP and VECSXP data and that is too dangerous for memory manager integrity. I'm not sure exactly how this will be resolve, but be prepared for changes. Best, luke
cheers Ben On 2024-06-08 6:39 p.m., Kevin Ushey wrote:
IMHO, this should be changed in both Rcpp and downstream packages: 1. Rcpp could check for out-of-bounds accesses in cases like these,
and
emit an R warning / error when such an access is detected; 2. The downstream packages unintentionally making these out-of-bounds accesses should be fixed to avoid doing that. That is, I think this is ultimately a bug in the affected packages,
but
Rcpp could do better in detecting and handling this for client
packages
(avoiding a segfault).
Best,
Kevin
On Sat, Jun 8, 2024, 3:06?PM Ben Bolker <bbolker at gmail.com
<mailto:bbolker at gmail.com>> wrote:
A change to R-devel (SVN r86629 or
<
has changed the handling of pointers to zero-length objects,
leading to
ASAN issues with a number of Rcpp-based packages (the commit
message
reads, in part, "Also define STRICT_TYPECHECK when compiling
inlined.c.")
I'm interested in discussion from the community.
Details/diagnosis for the issues in the lme4 package are here:
https://github.com/lme4/lme4/issues/794
<https://github.com/lme4/lme4/issues/794>,
with a bit more discussion
about how zero-length objects should be handled.
The short(ish) version is that r86629 enables the
CATCH_ZERO_LENGTH_ACCESS definition. This turns on the CHKZLN macro
<
<
,
which returns a trivial pointer (rather than the data pointer that
would
be returned in the normal control flow) if an object has length 0:
/* Attempts to read or write elements of a zero length vector will
result in a segfault, rather than read and write random
memory.
Returning NULL would be more natural, but Matrix seems to
assume
that even zero-length vectors have non-NULL data pointers, so
return (void *) 1 instead. Zero-length CHARSXP objects still
have
a
trailing zero byte so they are not handled. */
In the Rcpp context this leads to an inconsistency, where
`REAL(x)`
is a 'real' external pointer and `DATAPTR(x)` is 0x1, which in turn
leads to ASAN warnings like
runtime error: reference binding to misaligned address
0x000000000001
for type 'const double', which requires 8 byte alignment
0x000000000001: note: pointer points here
I'm in over my head and hoping for insight into whether this
problem
should be resolved by changing R, Rcpp, or downstream Rcpp
packages ...
cheers
Ben Bolker
______________________________________________
R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
<https://stat.ethz.ch/mailman/listinfo/r-devel>
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel