Skip to content

[Rcpp-devel] mixing R's and C++'s RNGs and distributions

10 messages · Matt D., Ramon Diaz-Uriarte

#
Dear All,

Sometimes I use both R's and C++11's RNGs and distributions in the same
code base (I am not using OpenMP or similar).  Although this might not be
very elegant, I find it convenient (use C++'s or R's, depending on which
one fits my problem better ---in particular, many distributions are not
readily available from C++). In C++ I tend to use std::mt19937, often with
a seed generated in R as

seed <- as.integer(round(runif(1, min = 0, max = 2^16)))

and passed to the C++ code.

It is my understanding that similar ideas (seeding C++'s RNG from R and
combining C++ with R's RNG) have been used before in much more complex
settings (e.g.,
http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2014-April/007510.html
and
http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2014-April/007512.html),
but I wonder if there are problems I cannot think of. A silly example
follows below.




Best,

R.


This is a silly example:

////////////////////////////////////////////////////////
#include <Rcpp.h>

using std::vector;
using namespace Rcpp;

double f0(vector<double> vec, std::mt19937& ran_gen) {
  shuffle(vec.begin(), vec.end(), ran_gen);
  return vec[0];
}

double f1() {
  RNGScope scope;
  double r = ::Rf_runif(0.0, 1.0);
  return r;
}

// [[Rcpp::export]]
double f3(int seed) {
  std::mt19937 ran_gen(seed);
  
  vector<double> vec(9);
  for(auto &v: vec ) {
    v = f1();
    Rcpp::Rcout << v << " ";
  }
  Rcpp::Rcout << std::endl;
  
  return f0(vec, ran_gen);
}

/*** R
f4 <- function(seed = NULL) {
  if(is.null(seed))
      seed <- as.integer(round(runif(1, min = 0, max = 2^16)))
  return(f3(seed))	
}
*/

////////////////////////////////////////////////////////
#
On 6/18/2015 14:34, Ramon Diaz-Uriarte wrote:
Seeding the Mersenne Twister PRNG can a bit subtle: I've ran across a 
series of posts a while ago that have brought up a couple of issues I 
haven't considered before -- perhaps you'll also find them of interest. 
The discussion threads (involving the author) may also be worth a look.
(There seems to be a connectivity issue with "www.pcg-random.org" at the 
moment, so also providing the "archive.is" URLs to be on the safe side.)

http://www.pcg-random.org/posts/cpp-seeding-surprises.html
// 
http://archive.is/http://www.pcg-random.org/posts/cpp-seeding-surprises.html
Discussions:
http://www.reddit.com/r/cpp/comments/32u4m7/the_behavior_of_rng_seeding_in_c_may_surprise/
http://www.reddit.com/r/programming/comments/32uo1p/c_seeding_surprises/

https://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html
// 
http://archive.is/http://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html

http://www.pcg-random.org/posts/simple-portable-cpp-seed-entropy.html
http://archive.is/http://www.pcg-random.org/posts/simple-portable-cpp-seed-entropy.html

http://www.pcg-random.org/posts/ease-of-use-without-loss-of-power.html
// 
http://archive.is/http://www.pcg-random.org/posts/ease-of-use-without-loss-of-power.html

In particular, `randutils::auto_seed_128` and `randutils::auto_seed_256` 
from the last post may be a choice worth consideration:
https://gist.github.com/imneme/540829265469e673d045#file-randutils-hpp

Discussion: 
http://www.reddit.com/r/cpp/comments/34yqxa/announcing_randutils_a_single_portable/

One other thing useful to know (in the context of, say, reproducibility) 
is that (in contrast with the PRNGs themselves) the "distributions' 
algorithms are not mandated, so implementations can vary":
http://www.reddit.com/r/cpp/comments/30w7cs/inconsistency_in_c_random/
The links seems to be referring to a parallel computing context. Given 
that, I'd actually consider using Random123:
http://www.thesalmons.org/john/random123/
https://github.com/DEShawResearch/Random123-Boost // a nice, brief 
overview of the advantages in the parallel computing context (but also 
potentially applicable elsewhere)

Best,

Matt
#
Dear Matt,
On Fri, 19-06-2015, at 00:13, Matt D. <matdzb at gmail.com> wrote:
Thanks a lot! Yes, certainly interesting, a lot. I felt embarrassed by my
way of seeding the PRNG from R.

I had seen randutils.hpp mentioned before, but had never payed any
attention. After your links below (and thanks for the archive URLs ---last
night those were the only ones I could access) I think I'll be
incorporating it into my work. Not only because of the seeding, but also
because of the additional facilities (choose, pick, etc).
Yes, they were. Thanks!
I wasn't aware of this.
Yes; I figured that if what I understood were similar approaches (mixing
R's and C++s' PRNG and seeding from R) worked in more complex scenarios,
they should work on mine.
Nice; I'll remember this for if/when I need them in a parallel computing
context.



Thanks,


R.

  
    
1 day later
#
As a follow up to the thread below, using randutils has turned out to be
very simple (and provides some additional nice features, such as pick,
etc). Directly using it in an R package makes R CMD check complain about
the usage of the Exit function (only its address is used, not the function
itself). That can be solved by changing the line

auto exit_func = hash(&_Exit);

by, say

auto getenv_func = hash(&getenv);

and making the corresponding change a little bit further below.


Best,

R.
On Fri, 19-06-2015, at 00:13, Matt D. <matdzb at gmail.com> wrote:

  
    
#
On 6/20/2015 21:13, Ramon Diaz-Uriarte wrote:
Sounds great, thanks for the update!

Best,

Matt
1 day later
#
Actually, I just noticed that things will not work if you need your package
to run on Windoze: Rtools uses gcc 4.6.3 there, and this will not work with
gcc 4.6 (neither in Linux nor Windows) with flag -std=c++0x or
-std=gnu++0x. I guess this should be fixable, but I do not know enough to
do it.

Best,


R.
On Sat, 20-06-2015, at 23:38, Matt D. <matdzb at gmail.com> wrote:

  
    
2 days later
#
On 6/22/2015 12:31, Ramon Diaz-Uriarte wrote:
Hi again!

I've just tried with the work-in-progress, _experimental_ version 
available at the following location:
https://rawgit.com/kevinushey/RToolsToolchainUpdate/master/mingwnotes.html

// In particular, I've used "Windows native compiler for 64 bit Windows 
output mingw32mingw64_gcc-4.9.2.toolchain.tar.gz".

This works better -- the only missing part is <thread> support :-( 
However, it is not exactly essential here, in that it's used solely in 
one place -- to get some extra entropy; that's all.
After temporarily removing dependence on `std::this_thread::get_id()`, 
the example -- available at 
http://www.pcg-random.org/posts/ease-of-use-without-loss-of-power.html 
-- compiles and runs successfully.

Incidentally,  one can get threads support for MinGW using MSYS2, which 
gives:
Thread model: posix
gcc version 4.9.2 (Rev5, Built by MSYS2 project)

However, the MinGW that comes with Rtools uses the following:
Thread model: win32
gcc version 4.9.2 (GCC)

I presume there must be a reason for that. There are certainly 
trade-offs present: 
https://wiki.qt.io/MinGW-64-bit#GCC_Threading_model_.28posix_vs_win32.29
What hits us here is the "no C++11 <thread>, <mutex>, or <future> " 
("C11" appears to be a typo) part for the win32 choice.
// See also: 
https://stackoverflow.com/questions/17242516/mingw-w64-threads-posix-vs-win32

Note that there's nothing multithreading-specific about the library, so 
even though I've also tested it with 
https://github.com/meganz/mingw-std-threads (warning: just something 
I've ran across while searching for multithreading support for MinGW 
built w/ win32 threading model, quality and license unknown) and also 
made it compile & work after a small patch (adding the std::hash 
specialization), it's probably possible to use another source of entropy 
here.

I presume another idea would be to use a different GCC version, but 
that's rather tedious -- 
https://stackoverflow.com/questions/25455829/using-a-different-gcc-version-from-that-included-with-rtools-with-rcpp-on-window
Other than the above, I'm wondering myself what's the "official" 
recommendation for the C++11 threading support w/ Rcpp on Windows.

Best,

Matt
#
Hi Matt,

Thanks a lot for the details and the work. That is great! There is a
problem, though: in my particular case, I am uploading my package to
BioConcutor, and there the compiler for Win is 4.6.3 so I am restricted to
that. Including randutils will lead to an error during building the package
in Windows.

Best,

R.
On Wed, 24-06-2015, at 14:55, Matt D. <matdzb at gmail.com> wrote:

  
    
#
On 6/24/2015 15:07, Ramon Diaz-Uriarte wrote:
Yeah, in this case I think replacing the source of entropy with 
something else may be the compromise choice (it's a range eventually 
passed to `mix_entropy`, so I'd investigate the effects on that).
That all being in the meantime / while waiting for the toolchain to 
catch up, of course...

Best,

Matt
#
On Wed, 24-06-2015, at 15:22, Matt D. <matdzb at gmail.com> wrote:
But I wonder if it is worth the effort.