Skip to content

RNG Cycle and Duplication (PR#12540)

10 messages · shli at stat.wvu.edu, Brian Ripley, Shengqiao Li +2 more

#
This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---559023410-851401618-1218751024=:15885
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE


I didn't describe the problem clearly. It's about the number of distinct=20
values. So just ignore cycle issue.

My tests were:

RNGkind(kind=3D"Knuth-TAOCP");
sum(duplicated(runif(1e7))); #return 46552

RNGkind(kind=3D"Knuth-TAOCP-2002");
sum(duplicated(runif(1e7))); #return 46415

#These collision frequency suggested there were 2^30 distinct values by=20
birthday problem.


RNGkind(kind=3D"Marsaglia-Multicarry");
sum(duplicated(runif(1e7))); #return 11682

RNGkind(kind=3D"Super-Duper");
sum(duplicated(runif(1e7))); #return 11542

RNGkind(kind=3D"Mersenne-Twister");
sum(duplicated(runif(1e7))); #return 11656

#These indicated there were 2^32 distinct values, which agrees with the=20
help info.

RNGkind(kind=3D"Wichmann-Hill");
sum(duplicated(runif(1e7))); #return 0

#So for this method, there should be more than 2^32 distinct values.

You may not get the exact numbers, but they should be close. So how to=20
explain above problem?

I need generate a large sample without any ties, it seems to me=20
"Wichmann-Hill" is only choice right now.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Shengqiao Li

The Department of Statistics
PO Box 6330
West Virginia University
Morgantown, WV 26506-6330
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
On Thu, 14 Aug 2008, Peter Dalgaard wrote:

            
=20
re=20
=20
the=20
n=20
ods=20
can=20
use=20
ing.=20
---559023410-851401618-1218751024=:15885--
#
Remember Wichmann-Hill is a composite generator.  Its composition does 
take more than 2^32 distinct values.

You still haven't identifed a problem here.  The note is to warn that 
runif() does repeat within a cycle, because people wrote code assuming 
otherwise.  It would be a poor use of runif() to rely on the low-order 
bits, and that's standard advice in the field.

For a large sample of uniforms use something like the normal inversion 
does, e.g. 2^(-30) * (runif(N, 0, 2^30) %% 2^30 + runif(N))

Please do leave R-bugs out of this: we already have 4 entries as a result 
of your misunderstandings and false claims.
On Thu, 14 Aug 2008, shli at stat.wvu.edu wrote:

            

  
    
#
shli at stat.wvu.edu wrote:
The birthday problem distribution applies to independent draws, but they 
are only pseudo-independent.  I think the only ways to know for sure if 
there are 2^30 values are to look at the code, or run through a complete 
cycle.  And you need to determine the cycle by looking at .Random.seed, 
not at the returned value.
If there are 2^30 distinct values for the two generators above, that 
also agrees with the documentation.
You haven't demonstrated what you claim, but if you look at the source, 
you'll see that in fact the man page is wrong:  Wichmann-Hill is based 
on 3 integer values, which each take on approximately 15 bits of 
different values.  So Wichmann-Hill could take nearly 2^45 different 
values (actually 30269*30307*30323).

The source is in https://svn.r-project.org/R/trunk/src/main/RNG.c if you 
want to check the others.
An alternative would be to construct a new value from two (or more) 
runif() values, but be careful that you don't mess up the distribution 
when you do that.

Duncan Murdoch
#
Thank you for your reply and for your suggestion. So the note in man page 
could be more accurate since for an end user, man page should be more 
helpful and source code is mainly for developers.

I was also adviced to use Knuth's  double version RANARRAY from
http://www-cs-faculty.stanford.edu/~knuth/programs.html instead of the 
integer versions in R. I'm a R user. So why not 
also include the double verion in R implementation?

Thanks again,

========================================
Shengqiao Li

Research Associate
The Department of Statistics
PO Box 6330
West Virginia University
Morgantown, WV 26506-6330

========================================
On Fri, 15 Aug 2008, Duncan Murdoch wrote:

            
#
Professor Ripley,

Thank you for your solution.  So the last paragraph of the Note in RNG 
help page will be updated since Wichmann-Hill is different from other 
supplied uniform generators in the number of distinct values?

Shengqiao Li
On Fri, 15 Aug 2008, Prof Brian Ripley wrote:

            
#
On Fri, 15 Aug 2008, Shengqiao Li wrote:

            
Please read and follow the R posting guide, and check the current version 
*before* posting as you were asked to do.

  
    
#
On 15/08/2008 10:28 AM, Shengqiao Li wrote:
You can try it using kind="user-supplied" if you like, but I suspect 
it's the same as "Knuth-TAOCP-2002".

Duncan Murdoch
#
Knuth's double version RNG rng-double.c dose a great job. No ties were 
observed for 10M numbers ( totally 2^52 possible different values?). 
In rng-double, double modulo mod_sum replaced the integer version mod_diff 
in the integer version rng.c that is adopted by R.

The integer version uses modulus 2^30. Therefore there are only 2^30 
distinct numbers, which is confirmed by my previous test in R.

If someday Knuth's double version is also included in R, it will be great.


Shengqiao Li
On Fri, 15 Aug 2008, Duncan Murdoch wrote:

            
#
On 16/08/2008 6:09 PM, Shengqiao Li wrote:
I don't see what the problem is -- why not just do it?  That's what 
"user-supplied" is for.

Duncan Murdoch
#
Hi,

If you want I can put RANARRAY in the 'randtoolbox' package? unless  
you want in R and not in a package?

In randtoolbox (not the version on cran), I port SFMT and WELL  
generator respectively from Matsumoto and L'Ecuyer. It could be a good  
idea to add Knuth's code?

Christophe Dutang

Le 17 ao?t 08 ? 00:09, Shengqiao Li a ?crit :