Skip to content

[Rcpp-devel] Question concerning use of Raster in C vi Rcpp plugin and inline

12 messages · Rainer M Krug, Douglas Bates, Romain Francois +1 more

#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi

I am new to C, and my question might be basic, but I am struggling with it:

I have about 10 lines of code in a simulation, which take up about 50%
of the whole simulation, As the simulation, takes several hours to run,
I want to make the code faster. After trying different approaches in R,
with different degrees of success, I decided to try out inline and Rcpp
for this.

I did some progress (using sugar), and the net-result is so far, that it
is as fast / slow as R; but as I said, this is a first step and I am
happy so far. My problem now is, in the R code, I have the following
(the complete code is below):

s <- seeds[x:(x+dx2), y:(y+dy2)]
dispSeeds[x,y] <- sum(
                                rbinom(
                                       s,
                                       s,
                                       sd2D
                                       ),
                                na.rm = TRUE
                                )

As the rbinom part is only using the vector representation of s, I have
already translated it into C.

So my problem / question is: can I use a similar syntax for s in inline
using sugar?

As the code above sits in two large loops, I would like to translate the
whole loops into C.

Below the code (R and R / inline):

maxx, maxy, dx, dy: scalar integers
seeds: integer matrix
dispSeeds: integer matrix

The pure R code:

#+BEGIN_SRC R
  dispRparFor <- function(
                          maxx  = maxx,
                          maxy  = maxy,
                          seeds = seeds,
                          sd2D  = sd2D,
                          dx    = dx,
                          dy    = dy,
                          dispSeeds = dispSeeds # the return value
                          ) {
    dx2 <- 2*dx
    dy2 <- 2*dy
    for ( y in 1:maxy ) {
      cat(y, " ")
      for (x in 1:maxx) {
        s <- seeds[x:(x+dx2), y:(y+dy2)]
        if (all(is.na(s))) {
          dispSeeds[x,y] <- NA
        } else {
          dispSeeds[x,y] <- sum(
                                rbinom(
                                       s,
                                       s,
                                       sd2D
                                       ),
                                na.rm = TRUE
                                )
        }
      }
    }
    return(dispSeeds)
  }
#+END_SRC


How far I got with using inline and Rcpp / sugar:


#+BEGIN_SRC R
  dispRparForSugar <- function(
                               maxx  = maxx,
                               maxy  = maxy,
                               seeds = seeds,
                               sd2D  = sd2D,
                               dx    = dx,
                               dy    = dy,
                               dispSeeds = dispSeeds # the return value
                               ) {
    library(inline)
    library(Rcpp)
    dx2 <- 2*dx
    dy2 <- 2*dy


    fx2 <- cxxfunction(
                       sig = signature(N = "integer", SIZE = "integer",
PROB = "double"),
                       body = '
                                   Rcpp::IntegerVector n (N);
                                   Rcpp::IntegerVector size (SIZE);
                                   Rcpp::NumericVector prob (PROB);

                                   int res;
                                   res = 0;

                                   for( int i=0; i<n.size(); i++){
                                        if (size[i]>0) {
                                          res += rbinom( 1, size[i],
prob[i] )[0];
                                        }
                                   }

                                   return wrap( res );
                                 ',
                       plugin = "Rcpp",
                       verbose = TRUE
                       )



    for ( y in 1:maxy ) {
      cat(y, " ")
      for (x in 1:maxx) {
        s <- seeds[x:(x+dx2), y:(y+dy2)]
        if (all(is.na(s))) {
          dispSeeds[x,y] <- NA
        } else {
          dispSeeds[x,y] <- fx2(s, s, sd2D)
        }
      }
    }
    return(dispSeeds)
  }
#+END_SRC


Ultimately, I would like to have the complete dispRparFor function in C.


For the ones interested, I could send a working example with data.

Cheers,

Rainer

- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Tel:        +33 - (0)9 53 10 27 44
Cell:       +27 - (0)8 39 47 90 42
Fax (SA):   +27 - (0)8 65 16 27 82
Fax (D) :   +49 - (0)3 21 21 25 22 44
Fax (FR):   +33 - (0)9 58 10 27 44
email:      Rainer at krugs.de

Skype:      RMkrug
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkziXZ0ACgkQoYgNqgF2egoL7wCfR5RapMcddSmM2f8ACBzjVG6p
GaQAni+XQJs2wzcJQNXpMxwFY/FqRa/W
=PmVZ
-----END PGP SIGNATURE-----
#
On Tue, Nov 16, 2010 at 4:31 AM, Rainer M Krug <r.m.krug at gmail.com> wrote:
I think it unlikely that you will be able to write C/C++ code that
runs much faster than that expression in R.  The sum and rbinom
functions are vectorized and other than a small interpretation
overhead not much can be saved by going to compiled code.
It is peculiar to call cxxfunction within another function if the code
being compiled is fixed.  cxxfunction is a function generator and one
usually regards a call to cxxfunction as the equivalent of entering a
function definition.

It doesn't appear that you are using n in your C++ code (other than
taking its length) and your C++ code seems to be evaluating
sum(rbinom(1, s, sd2D)) rather than sum(rbinom(s, s, sd2D)).  Is that
intentional?

And if your fx2 is always to be called as fx2(s, s, sd2D) then why not
just pass s and sd2D?
#
On 11/16/2010 03:16 PM, Douglas Bates wrote:
The main reason why I am trying to use C are the loops around the code
above. See attached code example.
I changed that - yes, it was peculiar but based on my experimentation.
Left over - changed as well. Thanks.
Done (partly).


To add some more strange things:

I attach the code in a working example as an R script file which will
show something strange (for me):

the function overwrites a variable, namely sd2D. Am I doing something
wrong or do I underestimate the dangers of using Rcpp and inline and C
in general?

In addition, the code is not doing what it is supposed to be doing, but
that is a different question at the moment.

Cheers,

Rainer
_______________________________________________
Rcpp-devel mailing list
Rcpp-devel-Z+qqJ2/841dDXCDGMXqaGq2UG9VpUWMKQH7oEaQurus at public.gmane.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

  
    
#
On Tue, Nov 16, 2010 at 9:43 AM, Rainer M Krug <Rainer at krugs.de> wrote:
Yes, it will.  You are assigning the NumericVector's s and sd2D to use
the storage of the SD2D argument so when you assign s[indS] in the
first loop you are overwriting the contents of SD2D.

I am testing a rewrite now.
#
I enclose a rewrite which compiles and executes without changing the
arguments.  I am not entirely sure that it is doing the correct
calculation.  I tried to compare the results from calls like

print(range(sd2D))
set.seed(1234)
print(system.time(dispSeedsC <- disperse(seeds.m, sd2D, dispRparForSugarOrg)))
set.seed(1234)
print(system.time(dispSeedsR <- disperse(seeds.m, sd2D, dispRparFor)))
print(range(sd2D))

and all.equal(dispSeedsC, dispSeedsR) fails, so I may have introduced
errors.  I am still somewhat confused by the logic.
On Tue, Nov 16, 2010 at 10:15 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: disp.cpp
Type: text/x-c++src
Size: 734 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20101116/cbe2df0c/attachment.cpp>
#
On 11/16/2010 05:49 PM, Douglas Bates wrote:
Hi Douglas,
That looks much better and "C like" - thanks a lot.
I will look at that - and I am optimistic that I will sort that out now.
What I want to calculate is the dispersal of seeds into each cell from
the neighboring cells, defined in sd2D. sd2D is a probability density
function, and seeds.m is the initial seed distribution of to be
dispersed seeds.

Well - the basic idea is of a mowing-window calculation over seeds.m,
where the window is defined by sd2D, and the calculation is based on the
rbinom numbers drawn from all cells around a particular cell (x,y),
covered y sd2D; the prob of rbinom is the value of sd2D, the size is the
value of the underlying cell in seeds.m, n is 1. All numbers drawn are
added up and stored in the cell (x,y). And this is repeated for all cells.

Now, I need the values for all cells, even those where sd2D does not
completely overlap with seeds.m, and this is why I buffer seeds.m with
NAs to get seeds, on which the calculations are now performed.

Hope this clarifies my problem, and my somehow strange approach (I am
thinking of skipping the buffering and to check for overlap in the loop
itself).

A procedural question:

You attached below a file named disp.cpp, which indicates to me that
your c code is in a separate file - is this the case? And if yes, how
can I do that with the inline command? Or do I have to compile
externally and use Rcpp directly?

Cheers and thanks a lot for your help,

Rainer
#
Le 17/11/10 10:17, Rainer M Krug a ?crit :
Ultimately, making a package is the best action.

Otherwise, I guess you could do something like this:

f <- cxxfunction( signature(....),
	paste( readLines( "disp.cpp" ), collapse = "\n" ),
         plugin = "Rcpp" )


I'll look at the rest of the thread and try to provide some help later.

Romain

  
    
#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 11/17/2010 10:26 AM, Romain Francois wrote:
True - But I haven't looked at that option yet, as I was happy with
sourcing the code, but I will for my next project.
Sounds good - would there be an option of including an alternative
argument to "body", i.e. "file", of which exactly one has to be supplied?
Thanks - I'll work on it today again, and post (hopefully) some progress
later today.

Rainer
- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Tel:        +33 - (0)9 53 10 27 44
Cell:       +27 - (0)8 39 47 90 42
Fax (SA):   +27 - (0)8 65 16 27 82
Fax (D) :   +49 - (0)3 21 21 25 22 44
Fax (FR):   +33 - (0)9 58 10 27 44
email:      Rainer at krugs.de

Skype:      RMkrug
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkzjpBYACgkQoYgNqgF2egrPFQCcDhoEaLbiV87gQLV+i21BSsbO
cSoAoIsUsUx6CCLZBIYVZYnzY2ry9TxC
=mnzJ
-----END PGP SIGNATURE-----
#
Le 17/11/10 10:44, Rainer M Krug a ?crit :
Perhaps. Feel free to submit a patch to the maintainer of inline.

  
    
#
On 17 November 2010 at 10:44, Rainer M Krug wrote:
| On 11/17/2010 10:26 AM, Romain Francois wrote:
| > Ultimately, making a package is the best action.
| 
| True - But I haven't looked at that option yet, as I was happy with
| sourcing the code, but I will for my next project.

Are you sure you haven't? ;-)   

You and I are co-authors of the earthmovdist package on R-Forge which is just
that: a small package with Rcpp around a single file you once for help with
in wrapping to C++.  Feel free to take it as a stanza.

https://r-forge.r-project.org/projects/earthmovdist/

Dirk
#
On 11/17/2010 01:57 PM, Dirk Eddelbuettel wrote:
Not for this project - but that package in which we are co-authors
definitely is doing that.

To clarify:
Up to now, the whole simulation (of which the the code discussed here is
s small part) was purely R, as it worked fine and was relatively easy to
maintain.
Now the simulation is growing, and the need for speed becomes more
evident. So I am looking into the coding sections in C++ and to use
inline or Rcpp. When the C++ code is working, and before the simulations
can run on a hpc cluster (I don't suppose gcc is installed on the
nodes), I have to compile it and the idea of a package, also in relation
to version control, is really appealing. But that is the next step.

Don't worry - I haven't forgotten earthmovdist and I will use the code
there as a blueprint for wrapping the C++ code.
Cheers,

Rainer
#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 11/17/2010 10:44 AM, Rainer M Krug wrote:
As promised, here is the reworked code below.

The code is working now, and about four times as fast as the fastest R
version - brilliant.
Nevertheless, I assume that there is still some scope for improvement -
although I have no idea where.

Any ideas on how to streamline the code are welcome.

To run the code, please download the surrounding R code and required
data files from:

http://thekrugs.free.fr/rainer/examples/

Cheers and thanks,

Rainer

Here is the working code:

#+BEGIN_SRC c++ :tangle spreadSim.windDispersal.cpp
  // The input parameter
  int dx2 = as<int>(DX2); // by reference or value?
  int dy2 = as<int>(DY2);
  NumericVector sd2D (SD2D); // by reference!
  IntegerMatrix seeds (SEEDS);
  IntegerMatrix mask (MASK);

  // result vector
  IntegerMatrix dispSeeds = clone<IntegerMatrix>(mask);

  // internal variables
  IntegerVector s (sd2D.size());
  RNGScope scope;                 // N.B. Needed when calling random
number generators

  int res;
  int nc = dispSeeds.ncol();
  int nr = dispSeeds.nrow();
  // BEGIN loop over seeds grid ("moving")
  for( int y=0; y < nc; y++ ){
    for( int x=0; x < nr; x++ ){
      // BEGIN loop over sd2D ("window")
      if ( dispSeeds(x, y) >= 0 ) {
        int indS = 0;
        for( int xS=x; xS <= x + dx2; xS++ )
          for( int yS=y; yS <= y + dy2; yS++, indS++ )
            s[indS]=seeds(xS, yS);

        res = 0;
        for( int i=0; i<s.size(); i++ ){
          if (s[i]>0 && sd2D[i]>0) {
            res += (int) ::Rf_rbinom((double)(s[i]), sd2D[i]);
          }
        }
        dispSeeds(x, y) = res;
      }
    }
   }
  // END loop over seeds
  return wrap( dispSeeds );
#+END_SRC
- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Tel:        +33 - (0)9 53 10 27 44
Cell:       +27 - (0)8 39 47 90 42
Fax (SA):   +27 - (0)8 65 16 27 82
Fax (D) :   +49 - (0)3 21 21 25 22 44
Fax (FR):   +33 - (0)9 58 10 27 44
email:      Rainer at krugs.de

Skype:      RMkrug
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkzlArkACgkQoYgNqgF2egq7jACfR3jzZeKkYifHBKeGyiTkEddU
4MMAnAvWSpB7panUtpt0fh6TEHcldZRx
=eY1p
-----END PGP SIGNATURE-----