Skip to content
Prev 359036 / 398503 Next

Ruofei Mo - How can I generate correlated data with non-normal distribution?

Ruofei,

Ben's suggestion is simple and gets you close:

require(MASS)
nsim <- 1000000
rho <- -.9
Z <- mvrnorm(nsim, mu=c(0,0),Sigma = cbind(c(1,rho),c(rho, 1)))
U <- pnorm(Z);
a <- Z[,1]
b <- qunif(U[,2])
cor(a,b)

Pearson correlation characterizes the linear relationship between normal
r.v.'s, but there's always a question about what Pearson correlation means
for non-normal r.v.'s....the "close" approach that Ben gave gives a pair of
r.v.'s with a non-linear relationship (because of the pnorm non-linear
transformation), which can be seen if you plot the means of a in bins
across the range of b.

plot(0,0,type='n',xlim=range(a), ylim=c(-.2,1.1))
for (i in seq(0,.99,by=0.01)){
  ind<-which(b>min(b)+i*diff(range(b)) & b<=min(b)+(i+.01)*diff(range(b)))
  points(mean(a[ind]),i-.005,pch=3) # means of a in 100 bins that span the
range of b
}

# so Spearman rank correlation is often used for "correlation" between
non-normal r.v.'s. Exact Spearman correlations can be attained using the
same approach, but adjusting the target correlation.

# to get Spearman correlation of rho = -0.90, use 2*sin(rho*pi/6) in place
of rho in the mvrnorm simulation:
nsim <- 1000000
rho <- -.9
Z <- mvrnorm(nsim, mu=c(0,0),Sigma =
cbind(c(1,2*sin(rho*pi/6)),c(2*sin(rho*pi/6), 1)))
a<-Z[,1]; b<-qunif(pnorm(Z[,2]))
cor(rank(a),rank(b))

# this gives r.v.'s with exact Spearman correlations as desired, while
aiming at Pearson correlation tends to be off by some amount

TargetCorrelation<-seq(-.99, -.1, by=.01)
SimulatedCorrelationP<-numeric(length(TargetCorrelation))
SimulatedCorrelationS<-numeric(length(TargetCorrelation))

nsim<-100000
for (i in 1:length(TargetCorrelation)){
  rho<-TargetCorrelation[i]
  Z <- mvrnorm(nsim, mu=c(0,0),Sigma = cbind(c(1,rho),c(rho, 1)))
  U <- pnorm(Z);
  a <- Z[,1]
  b <- qunif(U[,2])
  SimulatedCorrelationP[i]<-cor(a,b)

  Z <- mvrnorm(nsim, mu=c(0,0),Sigma =
cbind(c(1,2*sin(rho*pi/6)),c(2*sin(rho*pi/6), 1)))
  U <- pnorm(Z);
  a <- Z[,1]
  b <- qunif(U[,2])
  SimulatedCorrelationS[i]<-cor(rank(a),rank(b))
}
plot(TargetCorrelation,SimulatedCorrelationP)
points(TargetCorrelation,SimulatedCorrelationS,pch=20)
lines(c(-1,0),c(-1,0),col=2)
plot(TargetCorrelation,SimulatedCorrelationP)
points(TargetCorrelation,SimulatedCorrelationS,pch=20)
lines(c(-1,0),c(-1,0),col=2)


-Dan
On Wed, Mar 2, 2016 at 3:46 PM, Ben Bolker <bbolker at gmail.com> wrote: