Skip to content

[Rcpp-devel] Debugging Rcpp code

4 messages · Romain Francois, Hadley Wickham

#
Hi all,

I'm attempting to write a simple version of tapply in C++ (see
attached).  However, when I run tapply2(1, 1, sum) (which should
return 1), R segfaults.  If I run R with gdb, I get the following
stack trace:

#0  0x03942120 in tapply2 (x=@0xbfffda68, i=@0xbfffda58,
fun=@0xbfffda50) at tapply.cpp:22
#1  0x0394298a in Rcpp::CppFunction_WithFormals3<Rcpp::Vector<14>,
Rcpp::Vector<14>, Rcpp::Vector<13>, Rcpp::Function>::operator()
(this=0x7f71c0, args=0xbfffdabc) at Module_generated_CppFunction.h:311
#2  0x0385b8b1 in InternalFunction_invoke ()
...

where line 22 is return out;

I suspect it's something to do with the my coercion between
std::vector and NumericVector to call the function, or between the
SEXP output and NumericVector to store the output:

  std::vector< std::vector<double> >::iterator g_it = groups.begin();
  NumericVector::iterator o_it = out.begin();
  for(; g_it != groups.end(); ++g_it, ++o_it) {
    *o_it = as<double>(fun(wrap(*g_it)));
  }

I'd appreciate any hints as to how to solve this particular problem,
as well as any general debugging strategies.

Thanks!

Hadley
#
Ooops, I completely misinterpreted the std::vector API.  To insert the
elements I need to do:

  for(x_it = x.begin(), i_it = i.begin(); x_it != x.end(); ++x_it, ++i_it) {
    int i = *i_it;
    if (i > groups.size()) {
      groups.resize(i);
    }
    groups[i - 1].push_back(*x_it);
  }

Hadley
On Fri, Nov 16, 2012 at 8:36 AM, Hadley Wickham <h.wickham at gmail.com> wrote:

  
    
#
That's the one;

You might like to di something like:

std::vector< std::vector<double> > groups( max(i) ) ;

you'll pay for the traversal of the max, but then you don't need to resize.




Calling fun() is going to be costly too (probably what will dominate). 
specially because of our internal::try_catch thing. See in 
Evaluator.cpp. that's a mess.

We should have something for faster evaluations, so that we would make 
the call and just use Rf_eval.



It will become more fun when we "apply" c++ functions. ^^

Romain


Le 16/11/12 16:03, Hadley Wickham a ?crit :

  
    
#
I ended up going with:

NumericVector tapply3(NumericVector x, IntegerVector i, Function fun) {
  std::map<int, std::vector<double> > groups;

  NumericVector::iterator x_it;
  IntegerVector::iterator i_it;

  for(x_it = x.begin(), i_it = i.begin(); x_it != x.end(); ++x_it, ++i_it) {
    groups[*i_it].push_back(*x_it);
  }
  NumericVector out(groups.size());

  std::map<int, std::vector<double> >::const_iterator g_it = groups.begin();
  NumericVector::iterator o_it = out.begin();
  for(; g_it != groups.end(); ++g_it, ++o_it) {
    NumericVector res = fun(g_it->second);
    *o_it = res[0];
  }
  return out;
}

which I think is much easier to understand.  It's slightly slower when
i is small and dense, but orders of magnitude faster when i is sparse.
That'll be nice - but here I'm more interested in showing off how to
use the STL, not in highly performant code (although this special case
is ~4x than tapply), so it's not a big deal.
Definitely!

Hadley