Skip to content

[Rcpp-devel] Const correctness of NumericVector iterators

9 messages · Martyn Plummer, Dirk Eddelbuettel, Romain Francois

#
This is related to David Shih's thread about modifying input arguments
but I think it needs its own thread.

I found a problem when I tried porting some Rcpp code to run outside of
Rcpp by providing my own NumericVector implementation based on STL
vectors. In Rcpp it is possible to obtain a non-constant iterator for a
constant vector, whereas STL does not allow this.

In particular, it is possible to modify a const NumericVector. The
compiler allows this because the actual contents of the NumericVector
are in auxiliary memory. But this breaks the abstraction of the
NumericVector class.

An example is given below.  The "sumsquares" function calculates the sum
of squares of a vector. Internally, it calls the "workhorse" function,
which is declared to take a constant reference as an argument. So it
looks safe to pass the input vector to "workhorse" without copying.
However the implementation of "workhorse" does in fact alter the vector.

### BEGIN SS.cc
#include <Rcpp.h>

using Rcpp::NumericVector;

static double workhorse(NumericVector const &v);

// [[Rcpp::export]]
double sumsquares(NumericVector v)
{
    //Since workhorse takes a constant reference we think we do not need
    //to copy v...
    return workhorse(v);
}

double workhorse(NumericVector const &v)
{
    double y = 0;
    for (NumericVector::iterator i = v.begin(); i != v.end(); ++i) {
        double &x = *i;
        //... but this function does alter its argument
        x *= x;
        y += x;
    }
    return y;
}
### END SS.cc

In R we have
[1] 14
[1] 1 4 9

Martyn


-----------------------------------------------------------------------
This message and its attachments are strictly confidential. If you are
not the intended recipient of this message, please immediately notify 
the sender and delete it. Since its integrity cannot be guaranteed, 
its content cannot involve the sender's responsibility. Any misuse, 
any disclosure or publication of its content, either whole or partial, 
is prohibited, exception made of formally approved use
-----------------------------------------------------------------------
#
Hi Martyn,
On 26 November 2014 at 13:20, Martyn Plummer wrote:
| This is related to David Shih's thread about modifying input arguments
| but I think it needs its own thread.
| 
| I found a problem when I tried porting some Rcpp code to run outside of
| Rcpp by providing my own NumericVector implementation based on STL

Interesting. I have seen that use case.  Typically I have 'real' C++ code
using STL which I'd then bring to R(cpp).

| vectors. In Rcpp it is possible to obtain a non-constant iterator for a
| constant vector, whereas STL does not allow this.
| 
| In particular, it is possible to modify a const NumericVector. The

"Not on purpose" but mostly because we get a SEXP we cannot protect better?

| compiler allows this because the actual contents of the NumericVector
| are in auxiliary memory. But this breaks the abstraction of the
| NumericVector class.
| 
| An example is given below.  The "sumsquares" function calculates the sum
| of squares of a vector. Internally, it calls the "workhorse" function,
| which is declared to take a constant reference as an argument. So it
| looks safe to pass the input vector to "workhorse" without copying.
| However the implementation of "workhorse" does in fact alter the vector.

Isn't that an argument for just using STL vectors ?

Otherwise, patches/pull requests always welcome...

Dirk

| 
| ### BEGIN SS.cc
| #include <Rcpp.h>
| 
| using Rcpp::NumericVector;
| 
| static double workhorse(NumericVector const &v);
| 
| // [[Rcpp::export]]
| double sumsquares(NumericVector v)
| {
|     //Since workhorse takes a constant reference we think we do not need
|     //to copy v...
|     return workhorse(v);
| }
| 
| double workhorse(NumericVector const &v)
| {
|     double y = 0;
|     for (NumericVector::iterator i = v.begin(); i != v.end(); ++i) {
|         double &x = *i;
|         //... but this function does alter its argument
|         x *= x;
|         y += x;
|     }
|     return y;
| }
| ### END SS.cc
| 
| In R we have 
| 
| > library(Rcpp)
| > sourceCpp("SS.cc")
| > x <- c(1,2,3)
| > sumsquares(x)
| [1] 14
| > x #is modified
| [1] 1 4 9
| 
| Martyn
| 
| 
| -----------------------------------------------------------------------
| This message and its attachments are strictly confidential. If you are
| not the intended recipient of this message, please immediately notify 
| the sender and delete it. Since its integrity cannot be guaranteed, 
| its content cannot involve the sender's responsibility. Any misuse, 
| any disclosure or publication of its content, either whole or partial, 
| is prohibited, exception made of formally approved use
| -----------------------------------------------------------------------
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
#
On 26 November 2014 at 10:15, Dirk Eddelbuettel wrote:
| 
| Hi Martyn,
|
| On 26 November 2014 at 13:20, Martyn Plummer wrote:
| | This is related to David Shih's thread about modifying input arguments
| | but I think it needs its own thread.
| | 
| | I found a problem when I tried porting some Rcpp code to run outside of
| | Rcpp by providing my own NumericVector implementation based on STL
| 
| Interesting. I have seen that use case.  Typically I have 'real' C++ code

A "not" was missing there:  "I have not seen that use case".   Sorry,  Dirk

| using STL which I'd then bring to R(cpp).
| 
| | vectors. In Rcpp it is possible to obtain a non-constant iterator for a
| | constant vector, whereas STL does not allow this.
| | 
| | In particular, it is possible to modify a const NumericVector. The
| 
| "Not on purpose" but mostly because we get a SEXP we cannot protect better?
| 
| | compiler allows this because the actual contents of the NumericVector
| | are in auxiliary memory. But this breaks the abstraction of the
| | NumericVector class.
| | 
| | An example is given below.  The "sumsquares" function calculates the sum
| | of squares of a vector. Internally, it calls the "workhorse" function,
| | which is declared to take a constant reference as an argument. So it
| | looks safe to pass the input vector to "workhorse" without copying.
| | However the implementation of "workhorse" does in fact alter the vector.
| 
| Isn't that an argument for just using STL vectors ?
| 
| Otherwise, patches/pull requests always welcome...
| 
| Dirk
| 
| | 
| | ### BEGIN SS.cc
| | #include <Rcpp.h>
| | 
| | using Rcpp::NumericVector;
| | 
| | static double workhorse(NumericVector const &v);
| | 
| | // [[Rcpp::export]]
| | double sumsquares(NumericVector v)
| | {
| |     //Since workhorse takes a constant reference we think we do not need
| |     //to copy v...
| |     return workhorse(v);
| | }
| | 
| | double workhorse(NumericVector const &v)
| | {
| |     double y = 0;
| |     for (NumericVector::iterator i = v.begin(); i != v.end(); ++i) {
| |         double &x = *i;
| |         //... but this function does alter its argument
| |         x *= x;
| |         y += x;
| |     }
| |     return y;
| | }
| | ### END SS.cc
| | 
| | In R we have 
| | 
| | > library(Rcpp)
| | > sourceCpp("SS.cc")
| | > x <- c(1,2,3)
| | > sumsquares(x)
| | [1] 14
| | > x #is modified
| | [1] 1 4 9
| | 
| | Martyn
| | 
| | 
| | -----------------------------------------------------------------------
| | This message and its attachments are strictly confidential. If you are
| | not the intended recipient of this message, please immediately notify 
| | the sender and delete it. Since its integrity cannot be guaranteed, 
| | its content cannot involve the sender's responsibility. Any misuse, 
| | any disclosure or publication of its content, either whole or partial, 
| | is prohibited, exception made of formally approved use
| | -----------------------------------------------------------------------
| | _______________________________________________
| | Rcpp-devel mailing list
| | Rcpp-devel at lists.r-forge.r-project.org
| | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
| 
| -- 
| http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
#
On Wed, 2014-11-26 at 11:20 -0600, Dirk Eddelbuettel wrote:
I got that. It is an unusual use case, but not crazy. We have a complex
likelihood function that we want to expose to R's optimization methods
for maximum likelihood, but we also want to drop the same code into JAGS
for Bayesian analysis via MCMC.

The current code uses exclusively (Rcpp)Armadillo which works
transparently in both environments so there is now no need for any
emulation.
Sure.

Martyn
-----------------------------------------------------------------------
This message and its attachments are strictly confidential. If you are
not the intended recipient of this message, please immediately notify 
the sender and delete it. Since its integrity cannot be guaranteed, 
its content cannot involve the sender's responsibility. Any misuse, 
any disclosure or publication of its content, either whole or partial, 
is prohibited, exception made of formally approved use
-----------------------------------------------------------------------
#
Hi Martyn,
On 26 November 2014 at 17:31, Martyn Plummer wrote:
| I got that. It is an unusual use case, but not crazy. We have a complex
| likelihood function that we want to expose to R's optimization methods
| for maximum likelihood, but we also want to drop the same code into JAGS
| for Bayesian analysis via MCMC.

Nice. That is a good setup, and I agree that Rcpp should support it.
 
| The current code uses exclusively (Rcpp)Armadillo which works
| transparently in both environments so there is now no need for any
| emulation.

Noted -- and just for the record, I tend to do exactly that for my work: use
Armadillo classes for plain C++, and RcppArmadillo around it for R.  

That gets very good and trusted and behaviour, and thanks the updated
converters from year also proper const, const &, ... behaviour at minimal
cost.  I really find that combination quite compelling.

So ... with that, do you think it is really worth mucking with
Rcpp::NumericVector to optimize const behaviour for what be an unusual use
case for Rcpp::NumericVector (which is rarely taken out of the R context) ?

Dirk
#
This is probably related to this: 


	template <int RTYPE>
	struct r_vector_const_iterator {
		typedef typename storage_type<RTYPE>::type* type ;
	};

There should be a const there somewhere. 

FWIW, more modern implementations of R/C++ are more safe on that issue: 

$ Rcpp11Script /tmp/martyn.cpp
file33e7a74d9ba.cpp:18:33: error: cannot initialize a variable of type 'NumericVector::iterator' (aka 'double *') with an rvalue of
      type 'const_iterator' (aka 'const double *')
   for (NumericVector::iterator i = v.begin(); i != v.end(); ++i) {
                                ^   ~~~~~~~~~
1 error generated.
make: *** [file33e7a74d9ba.o] Error 1
Error in dyn.load(basename(dynlib)) :
  unable to load shared object '/private/var/folders/k8/gmp11jks4hl_nwcnk82fbr8m0000gn/T/RtmpKvVUpq/file33e7a74d9ba.so':
  dlopen(/private/var/folders/k8/gmp11jks4hl_nwcnk82fbr8m0000gn/T/RtmpKvVUpq/file33e7a74d9ba.so, 6): image not found
Calls: sourceCpp -> <Anonymous> -> dyn.load
In addition: Warning message:
running command '/Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB 'file33e7a74d9ba.cpp'' had status 1
Execution halted


If I change NumericVector::iterator to auto, I then get: 

$ Rcpp11Script /tmp/martyn.cpp
file34f617c08b2.cpp:19:16: error: binding of reference to type 'double' to a value of type 'const double' drops qualifiers
       double &x = *i;
               ^   ~~
1 error generated.
make: *** [file34f617c08b2.o] Error 1
Error in dyn.load(basename(dynlib)) :
  unable to load shared object '/private/var/folders/k8/gmp11jks4hl_nwcnk82fbr8m0000gn/T/Rtmp74ZJLp/file34f617c08b2.so':
  dlopen(/private/var/folders/k8/gmp11jks4hl_nwcnk82fbr8m0000gn/T/Rtmp74ZJLp/file34f617c08b2.so, 6): image not found
Calls: sourceCpp -> <Anonymous> -> dyn.load
In addition: Warning message:
running command '/Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB 'file34f617c08b2.cpp'' had status 1
Execution halted

Hard compiler error/GOOD THING.
2 days later
#
Promoted this to an issue on github. https://github.com/RcppCore/Rcpp/issues/209

Should not be very hard to fix, but this is indeed a bug. The code should not compile if Rcpp respected const correctness.
#
And now sent a pull request. Not sure this will be merged in with the new commandment that ? Thou shalt not break abi ?. 
See: https://github.com/RcppCore/Rcpp/pull/211

Romain
#
On 30 November 2014 at 13:41, Romain Fran?ois wrote:
| And now sent a pull request. Not sure this will be merged in with the new commandment that ? Thou shalt not break abi ?. 
| See: https://github.com/RcppCore/Rcpp/pull/211

The PR breaks a couple of packages. 

More details are at the PR #211 discussion, and the usual build log repo [1],
eg in file [2] and the other committed along with it.

Dirk

[1] https://github.com/RcppCore/rcpp-logs/
[2] https://github.com/RcppCore/rcpp-logs/blob/master/results/Rcpp-Summary-20141130.R