[Rcpp-devel] Rcpp vector classes vs std::vector for custom class member variables - Rcpp-devel

Tue, Oct 8, 2013 4:04 AM #

Dear all,

I'm new to Rcpp and this mailing list. I did look for a previous answer to
this question, but it's hard to summarise succinctly so I may have missed
something. Apologies if so.

I'm defining a custom class, an object of which will need to survive across
various calls back and forth between R and C++, so I plan to use the XPtr
class to wrap a pointer. My question is, what are the advantages and
disadvantages of using Rcpp vector classes (vs std::vector) for member
variables? To be more concrete, I mean

class Foo
{
private:
  Rcpp::NumericVector bar;
}

vs

class Foo
{
private;
  std::vector<double> bar;
}

Are there garbage collection issues when these live inside an XPtr<Foo>?
Are there speed advantages of std::vector<double> over Rcpp::NumericVector
for general use? Any input would be welcome. Thanks in advance.

Great work on Rcpp, by the way. I've been hearing very good things for
quite some time, but wasn't sure if it was worth dusting off my slightly
rusty C++ for. Suffice to say I think it was. The API is very clean and
returning to the standard R API will be painful...!

All the best,
Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20131008/ee06533b/attachment.html>

Dirk Eddelbuettel

Tue, Oct 8, 2013 4:34 AM #

Hi Jon,

On 8 October 2013 at 12:04, Jon Clayden wrote:

| I'm new to Rcpp and this mailing list. I did look for a previous answer to this
| question, but it's hard to summarise succinctly so I may have missed something.
| Apologies if so.
| 
| I'm defining a custom class, an object of which will need to survive across
| various calls back and forth between R and C++, so I plan to use the XPtr class
| to wrap a pointer. My question is, what are the advantages and disadvantages of
| using Rcpp vector classes (vs std::vector) for member variables? To be more
| concrete, I mean
| 
| class Foo
| {
| private:
| ? Rcpp::NumericVector bar;
| }
| 
| vs
| 
| class Foo
| {
| private;
| ? std::vector<double> bar;
| }

Here is the first choice: std::vector<double> vs Rcpp::NumericVector.  

Generally speaking I would use the former when I know I will interface other
C++ code requiring that interface.  I use the latter if I need to 'just' pass
things back and forth and maybe use my own (locally added) routines. [ And
you can go pretty cheaply from Rcpp::NumericVector to std::vector. ]

I use a third (arma::vec and its matrices) when I do linear algebra. (And
could also use RcppEigen for different vectors).
 
| Are there garbage collection issues when these live inside an XPtr<Foo>?

XPtr means R does not touch it. There will never be a gc.  So XPtr makes less
sense with R classes -- you want to be 'away from R' already for (large)
objects so I would tend to use XPtr of std::vector.  Or maybe even XPtr of
your class Foo.  Or use Jay and Mike's bigmemory (which uses an external
pointer internally too). It all depends.

| Are there speed advantages of std::vector<double> over Rcpp::NumericVector
| for general use? Any input would be welcome. Thanks in advance.

I would profile rather than believing what random stranger on the Internet
tell you :)  But ex ante there should not be a large difference.  Returning
from std::vector to R may involve a copy -- not sure. 
 
| Great work on Rcpp, by the way. I've been hearing very good things for quite
| some time, but wasn't sure if it was worth dusting off my slightly rusty C++
| for. Suffice to say I think it was. The API is very clean and returning to the
| standard R API will be painful...!

Thanks! Glad you are finding it useful.

Dirk

Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com

Romain Francois

Tue, Oct 8, 2013 5:25 AM #

Le 08/10/13 13:04, Jon Clayden a ?crit :

No. An XPtr<Foo> will delete the object it points to, using Foo's 
destructor when it goes out of scope.

I would argue against using external pointers directly when you can use 
modules and experience more type safety than with direct external pointers.

But these (using external pointers and using modules) only make sense 
when you want to be able to hold a reference to your object at the R 
level, do you ?

This is premature optimization. What you want to ask yourself is what 
are you going to do with "bar". If bar goes back and forth between the 
C++ and the R side, then NumericVector is your best candidate.

If bar is something internal to your class, then std::vector<> is fine 
and will give you a more complete interface, will grow efficiently, etc ...

If you really want to have the best performing class for your 
application, you need to measure it.

It is easy enough to make Foo a template and switch between the two in 
benchmarking:

template <typename Container>
class Foo {

private:
	Container bar ;
} ;

Foo< std::vector<double> > f1;
Foo< Rcpp::NumericVector > f2;

Great. You don't need expert knowledge of C++ for Rcpp to be useful.

Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

Jon Clayden

Tue, Oct 8, 2013 5:53 AM #

Thanks Dirk and Romain for your helpful replies. To follow up briefly...

I'm defining a custom class, an object of which will need to survive

Sure. And the memory allocated to "bar" (if it's a NumericVector) will be
protected from the garbage collector until the Foo object is deleted?

I think I do... ;)  I need the object to hold state and not be deallocated
between calls into the C++ code. I also want to allow for the possibility
that multiple Foo objects exist, and are being operated on simultaneously.
So holding a handle on the R side and passing it back to each C++ function
that works with the object seems like the natural approach to me.

Agreed, but it's a consideration. I fully accept that the choice depends on
the particular application, as you and Dirk both said. I was just wondering
what the baseline performance difference (if any) might be.

Sure, but I'm doing quite a bit in native code so there's little point in
dropping competent C code for badly-written C++. Yes, I could mix them, but
it's nice to be able to make the most of the tools available... :)

Regards,
Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20131008/7586284a/attachment.html>

Romain Francois

Tue, Oct 8, 2013 6:07 AM #

Le 08/10/13 14:53, Jon Clayden a ?crit :

yes

I'd strongly advise to consider using modules as the vessel for that 
sort of things.

This way, on the R side, you have something concrete instead of 
something opaque. R does not know what is inside an external pointer, 
but a module object, you have access to its fields, methods, etc ...

There is no such answer. It really depends on what you do with bar

Sure. Refactoring existing C code into C++ is kind of hard, but writing 
new C++ code instead of new C code is easier. At least it is from my 
perspective.

Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

Dirk Eddelbuettel

Tue, Oct 8, 2013 6:19 AM #

On 8 October 2013 at 13:53, Jon Clayden wrote:

| Sure. And the memory allocated to "bar" (if it's a NumericVector) will be
| protected from the garbage collector until the Foo object is deleted?

Yes, R objects created by Rcpp, eg Rcpp::NumericVector and all the other
Rcpp::* objects mapping to standard R types, are indistinguishable from
native R objects and behave the same at the R level as objects created by R.

That is essentially the whole point.

| I think I do... ;) ?I need the object to hold state and not be deallocated
| between calls into the C++ code. I also want to allow for the possibility that
| multiple Foo objects exist, and are being operated on simultaneously. So
| holding a handle on the R side and passing it back to each C++ function that
| works with the object seems like the natural approach to me.

A really simple way to do that is that create a container class that has this
type as an object, and to create an init function, an accessor function, ...

Rcpp modules can do that for you too just via declarations, resulting in 
Reference Class objects at the R level.  This is a little more advanced;
maybe more suitable for your next project, or now if you are willing to read
up (Rcpp modules vignette, corresponding chapter in the Rcpp book, existing
packages, ...)

| Agreed, but it's a consideration. I fully accept that the choice depends on the
| particular application, as you and Dirk both said. I was just wondering what
| the baseline performance difference (if any) might be.

Nobody knows ex ante. There is no explicit slowdown baked in. As we have
suggested several times, you need to measure it!  

| Sure, but I'm doing quite a bit in native code so there's little point in
| dropping competent C code for badly-written C++. Yes, I could mix them, but
| it's nice to be able to make the most of the tools available... :)

You can do whatever you want and how you want to do it. You can write as much
K&R C code as you like.

Rcpp is here to get your data more seamlessly from R to C++ and back again,
and a lot else.  

If you then prefer to cast back to C, go for it. Not my style, but heck,
choice is good. It even lets people do silly things like work in C ;-)

Dirk

Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com

Jon Clayden

Tue, Oct 8, 2013 6:34 AM #

On 8 October 2013 14:19, Dirk Eddelbuettel <edd at debian.org> wrote:

Right, thanks. I will read the modules vignette for a start.

Yes, OK - poor phrasing on my part. I take the point.

Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20131008/fd3dd39e/attachment.html>