Skip to content
Prev 69638 / 398525 Next

Using R to illustrate the Central Limit Theorem

This thread was very timely for me since I will be teaching an introductory
biostats course in the fall, including a session on CLT. I have shamelessly
taken Dr. Venables' slick piece of code and wrapped it in a function so that
I can vary the maximum sample size at will. I then created functions based
on the first to sample from the exponential and the chi-squaredistributions.
Lastly, I created a function to sample from a pdf with a parabolic shape
(confirming in the process just how rusty my calculus is :-) ). My code is
below for all to do with as they please.

Now, at the risk of asking a really obvious question ...

In my final function, I used the inverse probability integral transformation
to sample from my parabolic distribution. I create a local function rparab
shown here:

rparab <- function(nn) {
u <- 2*runif(nn) - 1
ifelse(u<0,-(abs(u)^(1/3)),u^(1/3))
}

It seems that in my version of R (2.0.1) on Linux, that calculating the cube
root of a negative number using ^(1/3) returns NaN. I looked at the help in
the arithmetic operators and did help.search("cube root"), 
help.search("root")
and help.search("cube") and recognised no alternatives. So I used an 
ifelse() to
deal with the negatives. Have I missed something really elementary?

Thanks

clt.unif <- function(n=20) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(runif(N*k), N, k)) - 0.5)*sqrt(12*k)
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("Uniform(0,1), n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}

clt.exp <- function(n=20,theta=1) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rexp(N*k,1/theta), N, k)) - theta)/sqrt(theta^2/k)
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("exp(",theta,"), n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}

clt.chisq <- function(n=20,nu=1) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rchisq(N*k,nu), N, k)) - nu)/sqrt(2*nu/k)
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("Chi-Square(",nu,"), n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}

clt.parab <- function(n=20) {
rparab <- function(nn) {
u <- 2*runif(nn) - 1
ifelse(u<0,-(abs(u)^(1/3)),u^(1/3))
}
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rparab(N*k), N, k)))/sqrt(3/(5*k))
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}
Bill.Venables at csiro.au wrote: