unexpected behaviour of rnorm(): summary
Robin Hankin <r.hankin at auckland.ac.nz> writes:
The original script was
f <- function(n){max(rnorm(n))}
plot(sapply(rep(5000,4000),f)) #[this takes my PC about 30 seconds]
Received wisdom is that "The Marsaglia-Multicarry and Kinderman-Ramage options don't play nicely together in the extreme tails of the Normal distribution" (PD) but I discovered last night that
[Unless you mean Probability Distribution, I'd say that that was an odd way of abbreviating Thomas Lumley. PBR for Brian D. Ripley below has been seen before from software that thinks his first name is "Professor"....]
f.uppertail <- function(n,upper=3){x <- rnorm(n);x[x>upper]}
plot(f.uppertail(1e7))
shows the effect much more directly...and viewed this way, the default RNG would seem to be more seriously flawed. The second script above considers the upper (ie Z>3) tail of the Gaussian: and it has gaps. This is borderline "extreme tails" because pnorm(3) is only about 0.998.
In contrast, Bug #1664 refers to the maximal value of a series of rnorm() values which, as PBR points out, _is_ a rather severe test of any RNG; the difference between f.uppertail() and #1664 would be that #1664 simulates
dmaxnorm <- function(x,n){n* dnorm(x)*pnorm(x)^(n-1)}
which involves PDFs and CDFs; f.uppertail() is solely a rnorm() issue. If nothing else, f.uppertail() is a simpler test for rnorm().
Actually, it's not all that different, and it is much easier to see if you think in terms of non-exceedance probabilities: P(max X < x) = P(X < x)^n so taking max just rescales the p axis on a plot of the CDF by taking p^n, which will naturally emphasize the upper tail. However if the distribution of X has a gap at x so will the distribution of the maximum. It *is* a bit uncomfortable. Probably someone ought to go in and investigate exactly what the nature of the interaction between the two methods amounts to, and maybe we should consider changing the default eventually (although as Brian said there are back-compatibility issues that make this rather less attractive).
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._