unexpected behaviour of rnorm() - R-help

Tue, Nov 26, 2002 12:25 PM #

Hello everyone.

If I do

f <- function(n){max(rnorm(n))}
plot(sapply(rep(5000,4000),f))   #[this takes my PC about 30 seconds]

then I get something quite unexpected: gaps in the distribution.  For
me, the most noticable one is at about 3.6.

Do others get this?  Is it an optical illusion?  It can't be right,
can it?  Or maybe I just don't understand the good ol' Gaussian very
well. 

anyone got an explanation?


[linux redhat 7.1; R-1.6.1]

Robin Hankin, Lecturer,
School of Geography and Environmental Science
Tamaki Campus
Private Bag 92019 Auckland
New Zealand

r.hankin at auckland.ac.nz
tel 0064-9-373-7599 x6820; FAX 0064-9-373-7042

as of: Wed Nov 27 09:15:00 NZDT 2002
This (linux) system up continuously for:  454 days, 14 hours, 57 minutes
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Peter Dalgaard

Tue, Nov 26, 2002 1:25 PM #

Robin Hankin <r.hankin at auckland.ac.nz> writes:

This is also illuminative:

  plot(sort(sapply(rep(2000,2000),f)),type="l")

We've seen odd behaviour of max(rnorm()) with the default random generators
before. Switching to RNGkind("Wichmann-Hill") made the effect
disappear here. Same thing switching the normal generator with
RNGkind("Marsaglia-Multicarry","Box-Muller").

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Tue, Nov 26, 2002 2:04 PM #

That's the maximum of 5000 normals, right?  That's pushing the accuracy of
some internal calculations too hard.

If you want to do this, you should use

RNGkind(, "Inversion")

That's not the default for back-compatibility reasons.

On Wed, 27 Nov 2002, Robin Hankin wrote:

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Thomas Lumley

Tue, Nov 26, 2002 2:09 PM #

On Wed, 27 Nov 2002, Robin Hankin wrote:

This is substantially the same as PR#1664.  The Marsaglia-Multicarry and
Kinderman-Ramage options don't play nicely together in the extreme tails
of the Normal distribution.

Changing the Normal generator
  RNGkind(normal="Inversion")
  RNGkind(normal="Box-Muller")
or the underlying uniform stream
  RNGkind(kind="Wichmann-Hill")
  RNGkind(kind="Super-Duper")
gives better looking results.


	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Göran Broström

Tue, Nov 26, 2002 10:46 PM #

On Tue, 26 Nov 2002 ripley at stats.ox.ac.uk wrote:

Just of curiosity, is this a general recommendation? I.e., should I put
that in my .Rprofile and get a generally better RNG? Speed issues?

which made me wonder.

G?ran


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Tue, Nov 26, 2002 11:46 PM #

On Wed, 27 Nov 2002, [iso-8859-1] GÃ¶ran BrostrÃ¶m wrote:

Yes, it is a general recommendation.  We would have changed to inversion
apart from reproducibility issues.  Inversion was implemented in a way
that is very accurate, if a little bit slower than the other normal
generators supplied.  (You can do the timings yourself.)

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Bob Wheeler

Wed, Nov 27, 2002 5:29 AM #

rziggurat() does not exhibit this behavior and is quite a
bit faster.

Robin Hankin wrote:

Bob Wheeler --- (Reply to: bwheeler at echip.com)
        ECHIP, Inc.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Ivan Frohne

Wed, Nov 27, 2002 7:32 AM #

So, where is rziggurat()?

--Ivan Frohne
----- Original Message ----- 
From: "Bob Wheeler" <bwheeler at echip.com>

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Bob Wheeler

Wed, Nov 27, 2002 12:25 PM #

Sorry, it is in the SuppDists package. It is one of George
Marsaglias's creations.

Ivan Frohne wrote:

Bob Wheeler --- (Reply to: bwheeler at echip.com)
        ECHIP, Inc.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Ravi Varadhan

Wed, Nov 27, 2002 2:18 PM #

Hi:

I am using R 1.5.1 on Windows, and the "Inversion" method for normal
generator doesn't exist in the "RNGkind" function.

I only see the following options inside the function:

c("Kinderman-Ramage", "Ahrens-Dieter", "Box-Muller",
        "user-supplied", "default")

Am I missing something?

Ravi.

----- Original Message -----
From: <ripley at stats.ox.ac.uk>
To: "Robin Hankin" <r.hankin at auckland.ac.nz>
Cc: <r-help at stat.math.ethz.ch>
Sent: Tuesday, November 26, 2002 5:04 PM
Subject: Re: [R] unexpected behaviour of rnorm()

-.-.-

http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html

_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

-.-.-

http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html

_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Wed, Nov 27, 2002 2:27 PM #

On Wed, 27 Nov 2002, Ravi Varadhan wrote:

Yes, an up-to-date version of R.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Peter Dalgaard

Wed, Nov 27, 2002 2:32 PM #

Ravi Varadhan <rvaradha at jhsph.edu> writes:

An upgrade. The Inversion method was added in 1.6.0.

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

robin hankin

Thu, Nov 28, 2002 1:56 PM #

Thanks to everyone who contributed to the discussion on rnorm(), both
online and off.  I learned a lot.  Several people pointed out that the
issue was reported in bug report #1664 and indeed it was.  Also, the
problem goes away with RNGkind(, "Inversion").

However, I regard trying to break a random number generator as
precisely the best way to understand random variables and hypothesis
testing (does anyone else agree? Has anyone else tried to break
rnorm() with or without success? can people post their attempts?)

The original script was

Received wisdom is that "The Marsaglia-Multicarry and Kinderman-Ramage
options don't play nicely together in the extreme tails of the Normal
distribution" (PD) but I discovered last night that

shows the effect much more directly...and viewed this way, the default
RNG would seem to be more seriously flawed.  The second script above
considers the upper (ie Z>3) tail of the Gaussian: and it has gaps.
This is borderline "extreme tails" because pnorm(3) is only about
0.998.

In contrast, Bug #1664 refers to the maximal value of a series of
rnorm() values which, as PBR points out, _is_ a rather severe test of
any RNG; the difference between f.uppertail() and #1664 would be that
#1664 simulates

which involves PDFs and CDFs; f.uppertail() is solely a rnorm() issue.
If nothing else, f.uppertail() is a simpler test for rnorm().

many thanks again to the List--it Rocks

rksh

Robin Hankin, Lecturer,
School of Geography and Environmental Science
Tamaki Campus
Private Bag 92019 Auckland
New Zealand

r.hankin at auckland.ac.nz
tel 0064-9-373-7599 x6820; FAX 0064-9-373-7042

as of: Fri Nov 29 09:27:00 NZDT 2002
This (linux) system up continuously for:  456 days, 15 hours, 09 minutes
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Peter Dalgaard

Thu, Nov 28, 2002 4:46 PM #

Robin Hankin <r.hankin at auckland.ac.nz> writes:

[Unless you mean Probability Distribution, I'd say that that was an odd
way of abbreviating Thomas Lumley. PBR for Brian D. Ripley below has been
seen before from software that thinks his first name is "Professor"....]

Actually, it's not all that different, and it is much easier to see if
you think in terms of non-exceedance probabilities:

  P(max X < x) = P(X < x)^n

so taking max just rescales the p axis on a plot of the CDF by taking
p^n, which will naturally emphasize the upper tail. However if the
distribution of X has a gap at x so will the distribution of the
maximum.

It *is* a bit uncomfortable. Probably someone ought to go in and
investigate exactly what the nature of the interaction between the two
methods amounts to, and maybe we should consider changing the default
eventually (although as Brian said there are back-compatibility issues
that make this rather less attractive).

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._