Skip to content

unexpected behaviour of rnorm()

14 messages · Thomas Lumley, Göran Broström, Ivan Frohne +5 more

#
Hello everyone.

If I do

f <- function(n){max(rnorm(n))}
plot(sapply(rep(5000,4000),f))   #[this takes my PC about 30 seconds]

then I get something quite unexpected: gaps in the distribution.  For
me, the most noticable one is at about 3.6.

Do others get this?  Is it an optical illusion?  It can't be right,
can it?  Or maybe I just don't understand the good ol' Gaussian very
well. 

anyone got an explanation?


[linux redhat 7.1; R-1.6.1]
#
Robin Hankin <r.hankin at auckland.ac.nz> writes:
This is also illuminative:

  plot(sort(sapply(rep(2000,2000),f)),type="l")

We've seen odd behaviour of max(rnorm()) with the default random generators
before. Switching to RNGkind("Wichmann-Hill") made the effect
disappear here. Same thing switching the normal generator with
RNGkind("Marsaglia-Multicarry","Box-Muller").
#
That's the maximum of 5000 normals, right?  That's pushing the accuracy of
some internal calculations too hard.

If you want to do this, you should use

RNGkind(, "Inversion")

That's not the default for back-compatibility reasons.
On Wed, 27 Nov 2002, Robin Hankin wrote:

            

  
    
#
On Wed, 27 Nov 2002, Robin Hankin wrote:

            
This is substantially the same as PR#1664.  The Marsaglia-Multicarry and
Kinderman-Ramage options don't play nicely together in the extreme tails
of the Normal distribution.

Changing the Normal generator
  RNGkind(normal="Inversion")
  RNGkind(normal="Box-Muller")
or the underlying uniform stream
  RNGkind(kind="Wichmann-Hill")
  RNGkind(kind="Super-Duper")
gives better looking results.


	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Tue, 26 Nov 2002 ripley at stats.ox.ac.uk wrote:

            
Just of curiosity, is this a general recommendation? I.e., should I put
that in my .Rprofile and get a generally better RNG? Speed issues?
which made me wonder.

G?ran


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Wed, 27 Nov 2002, [iso-8859-1] Göran Broström wrote:

            
Yes, it is a general recommendation.  We would have changed to inversion
apart from reproducibility issues.  Inversion was implemented in a way
that is very accurate, if a little bit slower than the other normal
generators supplied.  (You can do the timings yourself.)
#
rziggurat() does not exhibit this behavior and is quite a
bit faster.
Robin Hankin wrote:

  
    
#
So, where is rziggurat()?

--Ivan Frohne
----- Original Message ----- 
From: "Bob Wheeler" <bwheeler at echip.com>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Sorry, it is in the SuppDists package. It is one of George
Marsaglias's creations.
Ivan Frohne wrote:

  
    
#
Hi:

I am using R 1.5.1 on Windows, and the "Inversion" method for normal
generator doesn't exist in the "RNGkind" function.

I only see the following options inside the function:

c("Kinderman-Ramage", "Ahrens-Dieter", "Box-Muller",
        "user-supplied", "default")

Am I missing something?

Ravi.

----- Original Message -----
From: <ripley at stats.ox.ac.uk>
To: "Robin Hankin" <r.hankin at auckland.ac.nz>
Cc: <r-help at stat.math.ethz.ch>
Sent: Tuesday, November 26, 2002 5:04 PM
Subject: Re: [R] unexpected behaviour of rnorm()
-.-.-
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._
-.-.-
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Wed, 27 Nov 2002, Ravi Varadhan wrote:

            
Yes, an up-to-date version of R.
#
Ravi Varadhan <rvaradha at jhsph.edu> writes:
An upgrade. The Inversion method was added in 1.6.0.
#
Thanks to everyone who contributed to the discussion on rnorm(), both
online and off.  I learned a lot.  Several people pointed out that the
issue was reported in bug report #1664 and indeed it was.  Also, the
problem goes away with RNGkind(, "Inversion").

However, I regard trying to break a random number generator as
precisely the best way to understand random variables and hypothesis
testing (does anyone else agree? Has anyone else tried to break
rnorm() with or without success? can people post their attempts?)

The original script was
Received wisdom is that "The Marsaglia-Multicarry and Kinderman-Ramage
options don't play nicely together in the extreme tails of the Normal
distribution" (PD) but I discovered last night that
shows the effect much more directly...and viewed this way, the default
RNG would seem to be more seriously flawed.  The second script above
considers the upper (ie Z>3) tail of the Gaussian: and it has gaps.
This is borderline "extreme tails" because pnorm(3) is only about
0.998.

In contrast, Bug #1664 refers to the maximal value of a series of
rnorm() values which, as PBR points out, _is_ a rather severe test of
any RNG; the difference between f.uppertail() and #1664 would be that
#1664 simulates
which involves PDFs and CDFs; f.uppertail() is solely a rnorm() issue.
If nothing else, f.uppertail() is a simpler test for rnorm().

many thanks again to the List--it Rocks

rksh
#
Robin Hankin <r.hankin at auckland.ac.nz> writes:
[Unless you mean Probability Distribution, I'd say that that was an odd
way of abbreviating Thomas Lumley. PBR for Brian D. Ripley below has been
seen before from software that thinks his first name is "Professor"....]
Actually, it's not all that different, and it is much easier to see if
you think in terms of non-exceedance probabilities:

  P(max X < x) = P(X < x)^n

so taking max just rescales the p axis on a plot of the CDF by taking
p^n, which will naturally emphasize the upper tail. However if the
distribution of X has a gap at x so will the distribution of the
maximum.

It *is* a bit uncomfortable. Probably someone ought to go in and
investigate exactly what the nature of the interaction between the two
methods amounts to, and maybe we should consider changing the default
eventually (although as Brian said there are back-compatibility issues
that make this rather less attractive).