Skip to content

[Rcpp-devel] inplace modification more affect other varibles

8 messages · Chenliang Xu, Dirk Eddelbuettel, Romain Francois +1 more

#
Hello,

With the following inplace sorting example, I understand the value of `a`
is sorted inplace, but it's strange to see the value of `b` is also
modified. This can cause some hard to detect bug, since the cpp function
may modify a variable defined in other scope.

It seems that rcpp doesn't respect the named field, which is adopted by R
to implement copy-on-modify. I don's see an easy fix on C++ side, since the
called cpp function has no information about variable binding in R. A
possible fix is adding a function `inplace` to R, which ensure the returned
variable has named filed = 0 so is safe to modify inplace. Then, we have to
call the function as `stl_sort_inplace(inplace(a))`, which seems odd but is
also informative. It shows clearly that we are breaking the pass-by-value
rule in R.

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
void stl_sort_inplace(NumericVector x) {
    std::sort(x.begin(), x.end());
}

```

```r
a <- seq(1, 0.1, -0.1)
b <- a
#  [1] 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

stl_sort_inplace(a)

a
#  [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

b
#  [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

a <- seq(1, 0.1, -0.1)
pure_function <- function (x) {
  y <- x
  stl_sort_inplace(y)
  print(y)
}
pure_function(a)
a
#  [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

```
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20141021/7c0f16cc/attachment-0001.html>
#
On 21 October 2014 at 20:22, Chenliang Xu wrote:
| Hello,
| 
| With the following inplace sorting example, I understand the value of `a` is
| sorted inplace, but it's strange to see the value of `b` is also modified. This
| can cause some hard to detect bug, since the cpp function may modify a variable
| defined in other scope.

Very well known issue -- maybe do a search for 'Rcpp::clone' ...

In a nutshell, SEXP objects are passed by a __pointer__ and changes do
therefore persist.  If you want distinct copies, use Rcpp::clone().
 
Dirk

| It seems that rcpp doesn't respect the named field, which is adopted by R to
| implement?copy-on-modify. I don's see an easy fix on C++ side, since the called
| cpp function has no information about variable binding in R. A possible fix is
| adding a function `inplace` to R, which ensure the returned variable has named
| filed = 0 so is safe to modify inplace. Then, we have to call the function as
| `stl_sort_inplace(inplace(a))`, which seems odd but is also informative. It
| shows clearly that we are breaking the pass-by-value rule in R.
| 
| ```cpp
| #include <Rcpp.h>
| using namespace Rcpp;
| 
| // [[Rcpp::export]]
| void stl_sort_inplace(NumericVector x) {
| ? ? std::sort(x.begin(), x.end());
| }
| 
| ```
| 
| ```r
| a <- seq(1, 0.1, -0.1)
| b <- a
| # ?[1] 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
| 
| stl_sort_inplace(a)
| 
| a
| # ?[1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
| 
| b
| #??[1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
| 
| a <- seq(1, 0.1, -0.1)
| pure_function <- function (x) {
| ? y <- x
| ? stl_sort_inplace(y)
| ? print(y)
| }
| pure_function(a)
| a
| # ?[1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
| 
| ```
| 
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
#
Hi Dirk,

Thanks for your quick answer. I don't think Rcpp::clone is what I was
looking for. I know `stl_sort_inplace(a)` modify the value of `a`, but it
surprise me to see it modify `b`. And it may modify some other variables c,
d, e, f..., and it's hard to know which variables point to the same place.
On Tue, Oct 21, 2014 at 8:31 PM, Dirk Eddelbuettel <edd at debian.org> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20141021/238deb0f/attachment.html>
#
a and b are the same object:
[1] "0x7f9504534948"
[1] "0x7f9504534948"

So clone is what you need here. 

Implementing copy on write for that kind of example is possible, but would require a lot of additional code, i.e. the iterator would need to handle the write operation. 

An undesirable side effect of this is that such iterators would be quite less performant, right now Rcpp is close to the metal and uses direct pointers as iterators when it makes sense. A price that everyone would have to pay. no go. 

Instead, the responsibility is given to the user to clone explicitly when changes will be made to the underlying object. 

Romain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20141022/9cdf9c1c/attachment.html>
#
Thanks a lot!

Does that mean we should never modify an argument passed from R to cpp?

On Wed, Oct 22, 2014 at 8:24 AM, Romain Francois <romain at r-enthusiasts.com>
wrote:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20141022/091d0dd1/attachment.html>
#
Pretty much. But sometimes that's what you want, and Rcpp does not get in the way. You just have to know the rules of the game. 
BTW, same rules apply when you use .Call C/R API, you are in charge of making the copy when it's needed. 

Romain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20141022/09417c74/attachment-0001.html>
#
On Tue, Oct 21, 2014 at 9:22 PM, Chenliang Xu <luckyrand at gmail.com> wrote:
This is common among C++ software called by R that modifies R objects
in place. For example, below DT2 is modified:
...junk...
a b
1: 1 1
2: 2 2
3: 3 3
#
Thanks a lot!

I thought that was a bug of data.table, when I tried to learn data.table.
Obviously, I was wrong. It's a feature of data.table, in which all set
functions change their input by reference. It also provide function copy
when a copy is needed.

Based on suggestion from Romain, I may just stay on the safe side and do
not modify argument passed to C++ from R. The users of data.table should be
aware of that data.table object is passed by reference, and call function
copy when needed. For other R objects, it seems cause too much trouble.
It's hard to detect variables pointing to the same place and I don't want
to provide a copy function.

On Wed, Oct 22, 2014 at 11:40 AM, Gabor Grothendieck <
ggrothendieck at gmail.com> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20141022/5a59b961/attachment.html>