Skip to content

Using R to illustrate the Central Limit Theorem

4 messages · Bill Venables, (Ted Harding), Matthias Kohl +1 more

#
Here's a bit of a refinement on Ted's first suggestion.

 
 N <- 10000
 graphics.off()
 par(mfrow = c(1,2), pty = "s")
 for(k in 1:20) {
    m <- (rowMeans(matrix(runif(M*k), N, k)) - 0.5)*sqrt(12*k)
    hist(m, breaks = "FD", xlim = c(-4,4), main = k,
            prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
    pu <- par("usr")[1:2]
    x <- seq(pu[1], pu[2], len = 500)
    lines(x, dnorm(x), col = "red")
    qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
    abline(0, 1, col = "red")
    Sys.sleep(1)
  }



-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
Ted.Harding at nessie.mcc.ac.uk
Sent: Friday, 22 April 2005 4:48 AM
To: Paul Smith
Cc: r-help at stat.math.ethz.ch
Subject: RE: [R] Using R to illustrate the Central Limit Theorem
On 21-Apr-05 Paul Smith wrote:
Similar to Francisco's suggestion:

  m<-numeric(10000);
  for(k in (1:20)){
    for(i in(1:10000)){m[i]<-(mean(runif(k))-0.5)*sqrt(12*k)}
    hist(m,breaks=0.3*(-15:15),xlim=c(-4,4),main=sprintf("%d",k))
  }

(On my slowish laptop, this ticks over at a satidfactory rate,
about 1 plot per second. If your mahine is much faster, then
simply increase 10000 to a larger number.)

The real problem with demos like this, starting with the
uniform distribution, is that the result is, to the eye,
already approximately normal when k=3, and it's only out
in the tails that the improvement shows for larger values
of k.

This was in fact the way we used to simulate a normal
distribution in the old days: look up 3 numbers in
Kendall & Babington-Smith's "Tables of Random Sampling
Numbers", which are in effect pages full of integers
uniform on 00-99, and take their mean.

It's the one book I ever encountered which contained
absolutely no information -- at least, none that I ever
spotted.

A more dramatic illustration of the CLT effect might be
obtained if, instead of runif(k), you used rbinom(k,1,p)
for p > 0.5, say:

  m<-numeric(10000);
  p<-0.75; for(j in (1:50)){ k<-j*j
    for(i in(1:10000)){m[i]<-(mean(rbinom(k,1,p))-p)/sqrt(p*(1-p)/k)}
    hist(m,breaks=41,xlim=c(-4,4),main=sprintf("%d",k))
  }

Cheers,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 21-Apr-05                                       Time: 19:48:05
------------------------------ XFMail ------------------------------

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
#
On 21-Apr-05 Bill.Venables at csiro.au wrote:
Very nice! (I can better keep up with it mentally, though, with
Sys.sleep(2) or Sys.sleep(3), which moght be better for classroom
demo).

One thing occurred to me, watching it: people might say "Yes,
we can see how the distribution -> Normal, nice and smooth,
especially in the tails and side-arms; but the peaks always look
a bit rough."

Which could be the cue for introducing "SD(ni) = sqrt(E[ni])",
and the following hack of the above code seems to show this OK
in the "rootograms":

N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:20) {
   m <- (rowMeans(matrix(runif(N*k), N, k)) - 0.5)*sqrt(12*k)
   hm <- hist(m, breaks = "FD", xlim = c(-4,4), main = k, plot=FALSE,
           prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
   hm$counts<-sqrt(hm$counts) ; 
   plot(hm,xlim = c(-4,4),main = k,ylab="sqrt(Frequency)")
   pu <- par("usr")[1:2]
   x <- seq(pu[1], pu[2], len = 500)
   lines(x, sqrt(N*dnorm(x)*(hm$breaks[2]-hm$breaks[1])), col = "red")
   qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
   abline(0, 1, col = "red")
   Sys.sleep(2)
}

(and also shows clearly how the tails of the sample move outwards
into the tails of the Normal, as in fact you expect from the finite
range of mean(runif(k)), especially initially: very visible for
k up to about 5, and not really settled down for k<10).

Next stop: Hanging rootograms!

Best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 22-Apr-05                                       Time: 13:10:19
------------------------------ XFMail ------------------------------
#
(Ted Harding) wrote:

            
Hello,

here is another idea to illustrate the Central Limit Theorem which is 
based on our package "distr".

# Illustration of the Central Limit Theorem
# using package "distr"
require(distr)
CLT <- function(Distr, n, sleep = 1){
# Distr: object of class "AbscontDistribution"
# n: iterations
# sleep: time to sleep
    graphics.off()
    par(mfrow = c(1,2))
   
    # expectation of Distr
    fun1 <- function(x, Distr){x*d(Distr)(x)}
    E <- try(integrate(fun1, lower = q(Distr)(0), upper = q(Distr)(1), 
Distr = Distr)$value,
             silent = TRUE)
    if(!is.numeric(E))
        E <- try(integrate(fun1, lower = q(Distr)(distr::TruncQuantile),
                           upper = q(Distr)(1-distr::TruncQuantile), 
Distr = Distr)$value,
                 silent = TRUE)
    # standard deviation of Distr
    fun2 <- function(x, Distr){x^2*d(Distr)(x)}
    E2 <- try(integrate(fun2, lower = q(Distr)(0), upper = q(Distr)(1), 
Distr = Distr)$value,
              silent = TRUE)
    if(!is.numeric(E2))
        E2 <- try(integrate(fun2, lower = q(Distr)(distr::TruncQuantile),
                            upper = q(Distr)(1-distr::TruncQuantile), 
Distr = Distr)$value,
                  silent = TRUE)
    std <- sqrt(E2 - E^2)
           
    Sn <- 0
    N <- Norm()
    for(k in 1:n) {
        Sn <- Sn + Distr
        Tn <- (Sn - k*E)/(std*sqrt(k))

        x <- seq(-5,5,0.01)
        dTn <- d(Tn)(x)
        ymax <- max(1/sqrt(2*pi), dTn)
        plot(x, d(Tn)(x), ylim = c(0, ymax), type = "l", ylab = 
"densities", main = k, lwd = 4)
        lines(x, d(N)(x), col = "orange", lwd = 2)
        plot(x, p(Tn)(x), ylim = c(0, 1), type = "l", ylab = "cdfs", 
main = k, lwd = 4)
        lines(x, p(N)(x), col = "orange", lwd = 2)
        Sys.sleep(sleep)
    }
}
   
# some examples
distroptions("DefaultNrFFTGridPointsExponent", 13)
CLT(Distr = Unif(), n = 20, sleep = 0)
CLT(Distr = Exp(), n = 20, sleep = 0)
CLT(Distr = Chisq(), n = 20, sleep = 0)
CLT(Distr = Td(df = 5), n = 20, sleep = 0)
CLT(Distr = Beta(), n = 20, sleep = 0)
distroptions("DefaultNrFFTGridPointsExponent", 14)
CLT(Distr = Lnorm(), n = 20, sleep = 0)
#
On 4/23/05, Matthias Kohl <Matthias.Kohl at uni-bayreuth.de> wrote:
Thanks to all for your numerous suggestions. Since I am just beginning
with R, it will take a while before I can fully digest your help.

Have a nice weekend!

Paul