There appears to be a mild bug, or at least a deficiency, in
rnorm. The bug becomes apparent when one looks at extremes
of the squares of the values generated by rnorm; rnorm is not
generating quite enough extreme values.
The R version that I am using is 1.4.1; I never got around to installing
1.5.0, and now since 1.5.1 is about to come out .... However, checking
the 1.5.0 release notes revealed no mention of fixing a bug in rnorm.
Version details:
platform sparc-sun-solaris2.7
arch sparc
os solaris2.7
system sparc, solaris2.7
status
major 1
minor 4.1
year 2002
month 01
day 30
language R
More detail regarding how the bug revealed itself:
==================================================
For n = 100, 200, ..., 1500
o I generated 1000 sequences of length n via rnorm(n),
o for each sequence x, I calculated m = the max of x^2
o I then calculated pval = 1 - pchisq(m,1)^n
o I then calculated s.hat.n = #{pval: pval < 0.05}/1000
I then plotted s.hat.n versus n. This ***should*** give a result
close to a horizontal straight line, at height 0.05 --- but it
didn't. For the larger values of n, the values of s.hat.n were
displaced significantly below 0.05.
After some discussion with colleagues, I replaced the calls to rnorm()
by calls to myrnorm() defined by
myrnorm <- function(n,mu=0,sigma=1){
mu + sigma*cos(2*pi*runif(n))*sqrt(-2*log(runif(n)))
}
which uses the ``(r,theta)'' method of generating random normals.
When I did so, the resulting values were indeed all ``close to'' 0.05,
as they should be.
I also tried the experiment using rchisq(n,1) instead of rnorm(n) (and
then of course taking m = max of x --- rather than max of x^2). Again
all the resulting values were close to 0.05 as ought to be the case.
(So rchisq() appears to be OK in this regard.)
Enclosed below is a script to demonstrate the bug.
cheers,
Rolf Turner
rolf@math.unb.ca
#==+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===
#
# Script to demonstrate the bug in rnorm.
#
myrnorm <- function(n,mu=0,sigma=1){
mu + sigma*cos(2*pi*runif(n))*sqrt(-2*log(runif(n)))
}
# If RFUN <- rnorm we get ``wrong'' answers; if RFUN <- myrnorm,
# we get ``right'' answers.
RFUN <- rnorm
NSER <- 1000
set.seed(350734)
rslt <- list()
for(K in 1:15) {
N <- 100*K
M <- matrix(RFUN(NSER*N),N,NSER)
T2 <- apply(M,2,function(x){max(x**2)})
PV <- 1 - pchisq(T2,1)**N
SZ <- sum(PV < 0.05)/NSER
rslt[[K]] <- SZ
cat(K,"\n")
}
rslt <- unlist(rslt)
plot(100*(1:15),rslt,ylim=c(0,0.1),xlab='n',ylab='s.hat.n')
abline(h=0.05)
error.bar(100*(1:15),rslt,lower=1.96*sqrt(0.05*0.95/1000),add=TRUE)
# Clean up:
rm(RFUN,NSER,K,N,M,T2,PV,SZ)
#==+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Bug in rnorm. (PR#1664)
2 messages · Rolf Turner, Peter Dalgaard
rolf@math.unb.ca writes:
There appears to be a mild bug, or at least a deficiency, in rnorm. The bug becomes apparent when one looks at extremes of the squares of the values generated by rnorm; rnorm is not generating quite enough extreme values. The R version that I am using is 1.4.1; I never got around to installing 1.5.0, and now since 1.5.1 is about to come out .... However, checking the 1.5.0 release notes revealed no mention of fixing a bug in rnorm.
...and I see the effect too with an r-patched from a few days back. [snip]
After some discussion with colleagues, I replaced the calls to rnorm()
by calls to myrnorm() defined by
myrnorm <- function(n,mu=0,sigma=1){
mu + sigma*cos(2*pi*runif(n))*sqrt(-2*log(runif(n)))
}
which uses the ``(r,theta)'' method of generating random normals.
When I did so, the resulting values were indeed all ``close to'' 0.05,
as they should be.
I also tried the experiment using rchisq(n,1) instead of rnorm(n) (and
then of course taking m = max of x --- rather than max of x^2). Again
all the resulting values were close to 0.05 as ought to be the case.
(So rchisq() appears to be OK in this regard.)
Also qnorm(runif(n)) seems to be closer to the target.
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._