Skip to content

[Rcpp-devel] can one modify array in R memory from C++ without copying it?

18 messages · Christian Gunning, andre zege, Darren Cook +5 more

#
On Tue, Nov 1, 2011 at 9:11 PM,
<rcpp-devel-request at r-forge.wu-wien.ac.at> wrote:
3 quick points:

1:  For the NumericMatrix->arma, you can use an advanced constructor
to get the behavior that you desire.
http://arma.sourceforge.net/docs.html#Mat -- you'll want copy_aux_mem
= false.

2:  Are you actually doing matrix math?  If you're just doing simple
element-by-element arithmetic, you might get just as good performance
with a simple loop or iterator.  You might try this first to
understand the R/C++ process, and *then* move to using Armadillo :)

3:  For completeness, note that "Rcpp::NumericMatrix r_m(clone(mem));"
*forces* a copy, thus restoring R's "no side-effects" semantics.

-Christian
#
Christian,

1. in my previous post  i used exactly that same constructor you are
talking about as you can see from the code i posted

2. i am not doing any math in this illustrative example, i am just
modifying a toy matrix and showing that this modification didn't propagate
back to R. Which means i am operating on a different chunk of memory, i.e.
i made a copy somewhere. In real life I need to do a some fairly involved
manipulations on several matrices each couple of gigs big, so it'd be nice
not to copy. I have armadillo code that does the job, but i wanted to call
it from R without copying the matrices. In pure R this stuff takes very
long time.

3. i am not 100% sure but this toy test that i did seem to indicate to me
that a copy is done regardless by NumericalMatrix. I need to see the code
for NumericMatrix to be absolutely sure, but i cannot explain things
otherwise
On Wed, Nov 2, 2011 at 12:31 AM, Christian Gunning <xian at unm.edu> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20111102/78ae8bd5/attachment.htm>
#
On Wed, Nov 2, 2011 at 4:00 AM,
<rcpp-devel-request at r-forge.wu-wien.ac.at> wrote:
Sorry about that, completely missed it...
Using what's below, this should work fine.
This is a little tricky, I almost missed it myself (again).
1 day later
#
Chris, i set a tiny matrix


mm<-matrix(1:10, nr=2),

cut your code  pasted it into R

after i run myfun(mm), i still have exactly the same matrix when i print mm
as before, not a doubled matrix

The function myfun did return a doubled matrix, but original matrix mm
stayed the same, so it still appears that original R memory was not changed
from C++

I maybe missing something very obvious, i don't understand how it's
possible that the same code seems to give you a different result, just
reporting what i personally observe when i go exactly through the motions
you suggested
On Wed, Nov 2, 2011 at 9:53 PM, Christian Gunning <xian at unm.edu> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20111103/932f2220/attachment.htm>
#
I think there was a missing line in Christian's sample, which was:
   myfun(mmint)

I.e. the point of his code was to show that a matrix of doubles gets
modified, a matrix of ints does not.
(This is nothing to do with Armadillo by the way; I could run his code
without the "require(RcppArmadillo)" line.)

If you change your above code to this
   mm<-matrix(1:10*1.0, nr=2)

it then gets modified.

Darren

  
    
#
Or change the first line from NumericMatrix to:
   Rcpp::IntegerMatrix r_m(mem);

Then the behaviour is reversed. The matrix of doubles does not get
modified, but the matrix of ints does!

Dirk, Romain, this is a bug-in-waiting. Is there any way to generate a
warning when the implicit deep copy happens? Or alternatively when the
pointer is being used implicitly... but my hunch is that I want to know
when there is any implicit conversion between int and double: modern C++
style is to explicitly declare all type conversions with static_cast<>
and friends.

Darren
#
Not a bug, this is expected behaviour. 

If you pass a matrix of ints to the NumericVector ctor, Rcpp has no choice but to coerce the data to a matrix of double, which means new data, hence the original data does not get modified. 

If you pass a matrix of double, no copy is required, therefore Rcpp operates directly on the data. 

Those are features. 



Le 4 nov. 2011 ? 08:01, Darren Cook <darren at dcook.org> a ?crit :
#
Hello Romain,
I did not mean the behaviour is a bug; I mean it is going to cause bugs.
I was wondering if there is a way to stop the implicit data coercion,
forcing the programmer to request it explicitly.

E.g. Are the implicit type conversions happening with some extra copy
constructors? If so, we could have conditional compilation to exclude
those. Something like:

  #ifndef FORCE_EXPLICIT
  IntegerMatrix(double*){...}
  #endif

Then programmers who don't like surprises would define FORCE_EXPLICIT,
and then their code would sometimes not compile and they would have to
write an explicit conversion.

(I've not thought that through, I just wanted to demonstrate what I had
in mind.)

Darren

  
    
#
On 4 November 2011 at 08:28, romain at r-enthusiasts.com wrote:
| Not a bug, this is expected behaviour. 
| 
| If you pass a matrix of ints to the NumericVector ctor, Rcpp has no choice but to coerce the data to a matrix of double, which means new data, hence the original data does not get modified. 
| 
| If you pass a matrix of double, no copy is required, therefore Rcpp operates directly on the data. 
| 
| Those are features. 

And they are documented.

Dirk

 
| Le 4 nov. 2011 ? 08:01, Darren Cook <darren at dcook.org> a ?crit :
| 
| >> I.e. the point of his code was to show that a matrix of doubles gets
| >> modified, a matrix of ints does not.
| > 
| > Or change the first line from NumericMatrix to:
| >   Rcpp::IntegerMatrix r_m(mem);
| > 
| > Then the behaviour is reversed. The matrix of doubles does not get
| > modified, but the matrix of ints does!
| > 
| > Dirk, Romain, this is a bug-in-waiting. Is there any way to generate a
| > warning when the implicit deep copy happens? Or alternatively when the
| > pointer is being used implicitly... but my hunch is that I want to know
| > when there is any implicit conversion between int and double: modern C++
| > style is to explicitly declare all type conversions with static_cast<>
| > and friends.
| > 
| > Darren
| > _______________________________________________
| > Rcpp-devel mailing list
| > Rcpp-devel at lists.r-forge.r-project.org
| > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
#
Chris, I am sorry, I am too dense -- extra copying happens to convert the type. That's great news, I mean that u could modify in place:), not that I couldn't get  I am using wrong type of input. Thanks for your help guys.

Sent from my iPhone
On Nov 2, 2011, at 9:53 PM, Christian Gunning <xian at unm.edu> wrote:

            
#
On 4 November 2011 at 16:56, Darren Cook wrote:
| > Not a bug, this is expected behaviour.
| 
| Hello Romain,
| I did not mean the behaviour is a bug; I mean it is going to cause bugs.
| I was wondering if there is a way to stop the implicit data coercion,
| forcing the programmer to request it explicitly.

R does exactly that too

R> a <- 1
R> typeof(a)
[1] "double"
R> a <- 1L
R> typeof(a)
[1] "integer"
R> 
 
and Rcpp uses Proxy Classes so this is the Right Thing (TM) to do.  We think,
at least.

| E.g. Are the implicit type conversions happening with some extra copy
| constructors? If so, we could have conditional compilation to exclude
| those. Something like:
| 
|   #ifndef FORCE_EXPLICIT
|   IntegerMatrix(double*){...}
|   #endif
| 
| Then programmers who don't like surprises would define FORCE_EXPLICIT,
| and then their code would sometimes not compile and they would have to
| write an explicit conversion.
| 
| (I've not thought that through, I just wanted to demonstrate what I had
| in mind.)

If you feel really strongly about you could consider a patch that makes this
non-R behaviour you suggest an option.  To most of us who use Rcpp between R
and C++ it really is a feature.

Don't get me wrong though: I like your input here and maybe the implicit
nature of things needs to be stressed even more.

Dirk

 
| Darren
| 
| > 
| > If you pass a matrix of ints to the NumericVector ctor, Rcpp has no
| > choice but to coerce the data to a matrix of double, which means new
| > data, hence the original data does not get modified.
| > 
| > If you pass a matrix of double, no copy is required, therefore Rcpp
| > operates directly on the data.
| > 
| > Those are features.
| > 
| > 
| > 
| > Le 4 nov. 2011 ? 08:01, Darren Cook <darren at dcook.org> a ?crit :
| > 
| >>> I.e. the point of his code was to show that a matrix of doubles
| >>> gets modified, a matrix of ints does not.
| >> 
| >> Or change the first line from NumericMatrix to: Rcpp::IntegerMatrix
| >> r_m(mem);
| >> 
| >> Then the behaviour is reversed. The matrix of doubles does not get 
| >> modified, but the matrix of ints does!
| >> 
| >> Dirk, Romain, this is a bug-in-waiting. Is there any way to
| >> generate a warning when the implicit deep copy happens? Or
| >> alternatively when the pointer is being used implicitly... but my
| >> hunch is that I want to know when there is any implicit conversion
| >> between int and double: modern C++ style is to explicitly declare
| >> all type conversions with static_cast<> and friends.
| 
| 
| -- 
| Darren Cook, Software Researcher/Developer
| 
| http://dcook.org/work/ (About me and my work)
| http://dcook.org/blogs.html (My blogs and articles)
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
#
Hi,
On Fri, Nov 4, 2011 at 9:36 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
What if Rcpp fires a warning (I guess there's a C function that you
can use to invoke R's `warning()`) in these scenarios?

That'd alert you as to what happened and still let people who just use
the CRAN-stalled Rcpp become aware of when this happens in their code.

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
#
On 4 November 2011 at 10:13, Steve Lianoglou wrote:
| On Fri, Nov 4, 2011 at 9:36 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
| > On 4 November 2011 at 16:56, Darren Cook wrote:
| > If you feel really strongly about you could consider a patch that makes this
| > non-R behaviour you suggest an option. ?To most of us who use Rcpp between R
| > and C++ it really is a feature.
| >
| > Don't get me wrong though: I like your input here and maybe the implicit
| > nature of things needs to be stressed even more.
| 
| What if Rcpp fires a warning (I guess there's a C function that you
| can use to invoke R's `warning()`) in these scenarios?

Hah. I invite to modify your copy and to activate a little message in each
ctor.  You will be amazed to see how many implicit conversions happen.

Plus this is templates for you.  It is not that we have an implicit

     // pseudo-code ... and a satire!

     // do this to really mess with Darren's head
     if (typeof(X)=="int") {
        doMeanConversionToNumeric(x)
     }

where we could neat insert

        std::cout << "Steve suggested we tell you that something was converted"	
 
| That'd alert you as to what happened and still let people who just use
| the CRAN-stalled Rcpp become aware of when this happens in their code.

Very nice in principle. A lot harder in practice.

What we all see here is a side effect of what is otherwise type and
conversion "magic".  The "No Free Lunch" theorem still holds.

Dirk
 
| -steve
| 
| --
| Steve Lianoglou
| Graduate Student: Computational Systems Biology
| ?| Memorial Sloan-Kettering Cancer Center
| ?| Weill Medical College of Cornell University
| Contact Info: http://cbio.mskcc.org/~lianos/contact
#
On Fri, Nov 4, 2011 at 10:39 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
Well ... I have always wanted my name "in lights," so to speak ... :-)

You could keep a global registry/map, where you keep track of the
addresses of the objects you've already fired warning about and the
time of the last warning, then ... you know ... check how long ago it
was before you fire a new warning on the address of the object that is
(re)implicitly being converted ... or something.

But when you say these are templates, I guess you have no ability to
even notice the conversion happening? (This is me being lost in the
"advanced" (really anything but beginner) C++ world).

-steve
#
On Fri, Nov 4, 2011 at 8:36 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
The distinction between a C++ object that allocates its own storage
and one that uses the storage allocated by R is explicit in RcppEigen.
 The templated class Eigen::Map uses the storage from the SEXP and
throws an error if, for example, you pass integers to a double
precision vector.  In the RcppEigen-Intro vignette that Dirk and I are
writing we mention that, because of this, you really want to declare
such objects as const.

The vector structure for double precision values in Eigen is called VectorXd so

const  Eigen::Map<Eigen::VectorXd>   foo(as<Eigen::Map<Eigen::VectorXd> >(Foo));

is guaranteed to use R's storage for the vector or throw an error.

The topic of this thread, modifying an array in R memory from C++, is
a no-no.  There are occasions, such as optimizing models with complex
structures or MCMC, where you want to modify the state of the object
without copying a large structure every time but, in these cases, it
is better to keep the volatile storage in the C++ object and expose it
through methods and fields in a reference class object in R.  Rcpp
modules are one way of doing this.  I ended up rolling my own method
in the lme4Eigen package (only available on R-forge at present)
because of the need to allow for serialize/unserialize operations on
the reference class object.
1 day later
#
Hello Dirk,
Yes, I'm used to the implicit conversions from other scripting
languages. Rcpp is at the join of languages with different philosophies:
dynamically-typed R and statically-typed C++. But more than that: in R a
very succinct script is regarded as a thing of beauty; in C++ explicitly
describing conversions is regarded as good form by the experts.

This creates some interesting challenges for Rcpp!

Darren

P.S. As a realistic example of how implicit conversions turn into bugs:
A developer tests with nice clean csv data, let's say it is stock OHLCV
data, where the values are always integers. An integer vector is passed
into Rcpp code, it modifies memory directly in some subtle way (e.g.
adjusting values down based on data age).

He tests on years worth of data and thousands of symbols and is feeling
very confident. Backtesting on the overall system shows it is making 10%
a year.

This goes into production but one day the program giving the data
changes and now has ".000000" on the end of all the integers.

R now treats it as doubles, Rcpp does an implicit conversion, is no
longer using a pointer and the values no longer get adjusted.
The data still looks plausible, it just hasn't been modified. The next
step, machine learning, goes with this data, and makes different
purchase recommendations. The system is making only 5% a year, but
because it is still making money no-one even realizes something has
broken.

Yes, better programmers would have validated their input data better,
would have had more checks, and would have realized the potential for
error. Real programmers on the other hand are human, make mistakes, and
have deadlines.
#
Darren,
On 6 November 2011 at 16:16, Darren Cook wrote:
| > If you feel really strongly about you could consider a patch that makes this
| > non-R behaviour you suggest an option.  To most of us who use Rcpp between R
| > and C++ it really is a feature.
| 
| Hello Dirk,
| Yes, I'm used to the implicit conversions from other scripting
| languages. Rcpp is at the join of languages with different philosophies:
| dynamically-typed R and statically-typed C++. But more than that: in R a
| very succinct script is regarded as a thing of beauty; in C++ explicitly
| describing conversions is regarded as good form by the experts.

Yes.
 
| This creates some interesting challenges for Rcpp!

Really?  

I think we support writing R code like you would otherwise, and the same for
C++ which one writes like C++ code. 

We simply make it easier to combine the two via Rcpp.
 
| Darren
| 
| P.S. As a realistic example of how implicit conversions turn into bugs:
| A developer tests with nice clean csv data, let's say it is stock OHLCV
| data, where the values are always integers. An integer vector is passed
| into Rcpp code, it modifies memory directly in some subtle way (e.g.
| adjusting values down based on data age).
| 
| He tests on years worth of data and thousands of symbols and is feeling
| very confident. Backtesting on the overall system shows it is making 10%
| a year.
| 
| This goes into production but one day the program giving the data
| changes and now has ".000000" on the end of all the integers.
| 
| R now treats it as doubles, Rcpp does an implicit conversion, is no
| longer using a pointer and the values no longer get adjusted.
| The data still looks plausible, it just hasn't been modified. The next
| step, machine learning, goes with this data, and makes different
| purchase recommendations. The system is making only 5% a year, but
| because it is still making money no-one even realizes something has
| broken.
| 
| Yes, better programmers would have validated their input data better,
| would have had more checks, and would have realized the potential for
| error. Real programmers on the other hand are human, make mistakes, and
| have deadlines.

Nice story. Can't wait for the movie version...

And that's why code gets tested.  Relying on your input data to be
transformed in place is NOT the paradigm commonly used either as you more
often have inputs a, b, c, .... for a function f() returning x, y, z, ...

In any way, it is moot.  You still haven't shown a bug---beyond the design
mistaken of assuming inputs were suitable for IntegerVector (or Matrix) when
that should have be Numeric all along.

This is blown way out of proportion.  As I said, the behaviour is documented,
and the corner case gets mentioned in all our talks too (see the Google Tech
Talk video for example).

Dirk
#
On November 6, 2011 07:32:44 AM Dirk Eddelbuettel wrote:
I agree with Dirk. Modifying Rcpp arguments in-place is not a beginner-level 
trick, so if you're writing code that does that kind of stuff, better check the 
type of the SEXP being passed in first and don't be surprised by implicit type 
conversions. 

As the old saying goes, "If you want to play with the big dogs, ..."

Davor