Skip to content

[Rcpp-devel] Subsetter uses int for indexing (among other issues)?

11 messages · Qiang Kou, Kevin Ushey, William Nolan +1 more

#
Hi all,

Longtime user and lurker here.

I got "index error" thrown by Rcpp when trying to subset a matrix with
width * height == 644764 * 3776 greater than MAXINT:

Allocating 647764 x 3776 matrix...
Catchpoint 1 (exception thrown), 0x00007ffff4baa920 in __cxa_throw () from
/usr/lib64/libstdc++.so.6
#1  0x0000000000473a05 in Rcpp::stop<>(char const*) (fmt=0x7fffe7bc3545
"index error")
    at
/home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/exceptions/cpp11/exceptions.h:52
52          throw Rcpp::exception( tfm::format(fmt,
std::forward<Args>(args)... ).c_str() );
(gdb) down
#0  0x00007ffff4baa920 in __cxa_throw () from /usr/lib64/libstdc++.so.6
(gdb) up
#1  0x0000000000473a05 in Rcpp::stop<>(char const*) (fmt=0x7fffe7bc3545
"index error")
    at
/home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/exceptions/cpp11/exceptions.h:52
52          throw Rcpp::exception( tfm::format(fmt,
std::forward<Args>(args)... ).c_str() );
(gdb) up
#2  0x00007fffe7bb75a6 in Rcpp::SubsetProxy<14, Rcpp::PreserveStorage, 13,
true, Rcpp::Vector<13, Rcpp::PreserveStorage> >::check_indices
(this=0x7fffffffc450, x=0x13f9920,
    n=3751, size=-1849010432) at
/home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/vector/:138
138                     stop("index error");
(gdb) up
#3  0x00007fffe7bb6a3c in Rcpp::SubsetProxy<14, Rcpp::PreserveStorage, 13,
true, Rcpp::Vector<13, Rcpp::PreserveStorage> >::get_indices
(this=0x7fffffffc450, t=...)
    at
/home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/vector/Subsetter.h:149
149             check_indices(ptr, rhs_n, lhs_n);
(gdb) l
144         #endif
145
146         void get_indices( traits::identity< traits::int2type<INTSXP> >
t ) {
147             indices.reserve(rhs_n);
148             int* ptr = INTEGER(rhs);
149             check_indices(ptr, rhs_n, lhs_n);
150             for (int i=0; i < rhs_n; ++i) {
151                 indices.push_back( rhs[i] );
152             }
153             indices_n = rhs_n;

As we can see from the stack trace and below, lhs.size() is negative when
cast to int:

(gdb) p (int)(lhs.size())
$12 = -1849010432

This is all coming from the assignment (via operator []) of the subsetting
of one matrix to another matrix's subset:

(static_cast<NumericVector&>(mat))[lhsI] =
(static_cast<NumericVector&>(signals))[rhsI];

(lhsI and rhsI are IntegerVector's)

Now, setting aside whether I *should* be doing that -- what I *do* see in
Subsetter.h (including what I understand to be the most recent version,
1.0.0 from github) is the use of int for indices all over the place in this
file, including in the member variable:

std::vector<int> indices;

Is there any reason why the indices that Subsetter uses internally
shouldn't be size_t or an equivalently capable type like R_xlen_t?
For example, Subsetter's check_indices function takes int's as arguments,
while Vector's size method returns R_xlen_t.

I'll change my code to manually copy elements via operator() using the
row/column arguments for now.  Seems like Subsetter is maybe not quite
ready for prime time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20181107/f86a037f/attachment.html>
#
Hi, William,

Can you give us a small piece of code to reproduce the error? That will
ease our discussion.

Best,

KK
On Tue, Nov 6, 2018 at 10:46 PM William Nolan <will at landale.net> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20181106/e5697465/attachment.html>
#
I agree we should be using R_xlen_t here. As always, PRs are welcome.

You might want to file this at https://github.com/RcppCore/Rcpp/issues
just so it doesn't get lost.

Thanks,
Kevin
On Tue, Nov 6, 2018 at 11:23 PM Qiang Kou <qkou at qkou.info> wrote:
#
Thanks. Will put together a small working example and file it. I might even attempt a fix in my copious free time ;-). 

Cheers,
Will
#
On 8 November 2018 at 09:19, William Nolan wrote:
| Thanks. Will put together a small working example and file it. I might even attempt a fix in my copious free time ;-). 

Awesome :)

We'll help.  Small steps and iterations work well. 

Dirk
#
Thanks, guys.

I submitted issue #919 <https://github.com/RcppCore/Rcpp/issues/919> with a
small-ish test case showing the issue.

Will take a look at what's necessary to address.  I see subsetting is in
the unit tests, but this is a funny one in that verifying the fix works
there would require a big chunk of memory (16GB, i.e. 2^31 * 8 bytes for a
Numeric) to be present on the tester's machine -- seems excessive.

Maybe it's enough for the fix to pass the existing unit tests, and pass the
testbed I put in the github issue.  Anyway, TBD
On Thu, Nov 8, 2018 at 11:30 AM, Dirk Eddelbuettel <edd at debian.org> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20181108/5151a2b8/attachment-0001.html>
#
Okay.  Changes are done and appear to work for *just* my test case.  Of
course I'd like to ensure I didn't break anything along the way.

I'd like to run the unit test suite on them before creating a PR, but am
unsure on what steps to take.

Given my local cloned git repo, with my edits to Subsetter.h, what's the
right way to recompile against it and run the unit tests there?

Many thanks,
Will
On Thu, Nov 8, 2018 at 2:45 PM, William Nolan <will at landale.net> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20181108/0229f135/attachment.html>
#
Hi, William,

I think you can send the PR first. It will trigger the unit tests on github.

Best,

KK
On Wed, Nov 7, 2018 at 10:52 PM William Nolan <will at landale.net> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20181107/e4cadfea/attachment.html>
#
On 7 November 2018 at 23:28, Qiang Kou wrote:
| Hi, William,
| 
| I think you can send the PR first. It will trigger the unit tests on github.

And running 'R CMD build ...' followed by 'R CMD check ...' on your modified
sources tests them locally.  There are equivalent GUI buttons in RStudio you
can use and devtools has support too (that I am not familiar with).

Dirk
#
Thanks.  That did the trick.  After testing w/my own test harness, I made
sure the unit tests work with a baseline cloned fork.
Then integrated my changes into the fork, re-built/checked with bumped-up
version # and tests all passed again.  Pushed fork to github.

I have just created PR #920 <https://github.com/RcppCore/Rcpp/pull/920>.
Thanks for all the help, and hopefully I didn't mess it up too badly on the
first try :-).

Regards,
Will
On Thu, Nov 8, 2018 at 9:35 PM, Dirk Eddelbuettel <edd at debian.org> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20181109/d02b99d4/attachment.html>
#
On 9 November 2018 at 14:14, William Nolan wrote:
| Thanks.  That did the trick.  After testing w/my own test harness, I made
| sure the unit tests work with a baseline cloned fork.
| Then integrated my changes into the fork, re-built/checked with bumped-up
| version # and tests all passed again.  Pushed fork to github.

We mostly get emails from people who are stuck and do not work this out, so
it is refreshing to have a positive result :)

And yes, that's the way to do it. Even better than normal by first confirming
the baseline. Well done.
 
| I have just created PR #920 <https://github.com/RcppCore/Rcpp/pull/920>.
| Thanks for all the help, and hopefully I didn't mess it up too badly on the
| first try :-).

Looks good so far as I mentioned there, will get more scrutiny. So thanks
(and congrats :) already.

Dirk