Skip to content

Correlated Multivariate Distribution Generator

5 messages · Yue Yu, Enrico Schumann, Doran, Harold

#
Dear R User,

I am wondering if there is a way to generate correlated multivariate
non-normal distribution?

For example, I want to generate four correlated negative binomial
series with parameters r=10, p=0.2, based on the correlation
coefficient matrix
|   1   0.9  0.8  0.8 |
| 0.9     1  0.8  0.8 |
| 0.8  0.8     1  0.9 |
| 0.8  0.8  0.9     1 |

Thank a lot.

Best,

Yue Yu
#
Hi Yue Yu,

similar questions have been discussed several times on this list; you may
want to search the archives.

Do you mean linear correlation, or rank correlation? In general, a given
linear correlation will not always be attainable (though in your case, it
probably will). If _rank_ correlation is fine, you could do something like
this:

(1) create Gaussians X with a given linear correlation (which is almost
identical to rank correlation in the Gaussian case, but you can even correct
this*)

(2) put X into the distribution function of the normal: you get uniforms U.
Since the distribution function is monotone, the rank correlation will
remain unaltered. Hence the U are (rank-)correlated just as the X.

(3) put U into the inverse of the desired distribution function. The inverse
is mononote as well, so the rank correlation remains the same. 

[This approach will only be practical if you can compute the inverse
reasonably fast.]


Regards,
Enrico

* see, eg, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1681917
#
Do you mean something like this?
[,1]        [,2]        [,3]       [,4]
[1,] 1.00000000  0.04227103  0.00358846 0.02860722
[2,] 0.04227103  1.00000000 -0.03122457 0.03070221
[3,] 0.00358846 -0.03122457  1.00000000 0.04621639
[4,] 0.02860722  0.03070221  0.04621639 1.00000000
[,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.9021538 0.8183199 0.8158886
[2,] 0.9021538 1.0000000 0.8114415 0.8152286
[3,] 0.8183199 0.8114415 1.0000000 0.9145778
[4,] 0.8158886 0.8152286 0.9145778 1.0000000
#
Thanks, Enrico and Harold.

I have searched the archives for this topic, but all I can find is
about the multivariate normal or uniform generator, which doesn't help
in my case.

Harold's transformation is good for getting the correlation, but the
negative binomial distribution will be changed after transformation,
won't keep the desired parameters r and p.

Enrico's method is brilliant for rank correlation, is there any
algorithm for linear correlation? Even some references for
multivariate negative binomial generator will help.

Your help are really appreciated

Best,

Yue Yu
On Wed, Jul 27, 2011 at 11:31, Doran, Harold <HDoran at air.org> wrote:
#
Just a pointer, not a solution: if you use the method I described, it all
comes down to the initial correlation matrix that you specify for the
Gaussian case. If you had the "correct" initial correlation matrix, it would
lead to the desired linear correlation in your binomials. (But it is not
guaranteed that such a "correct" matrix even exists.)

You could now try to adjust the initial matrix such that the obtained linear
correlation in your final variates is as desired. There are a number of
techniques, but I cannot judge for I have never used any of them. There is,
eg, an approach called NORTA (stands for "normals to anything", or something
like that); you way want to look for that.

But to use your specific example:

N <- 200000
C <- c( ? 1, ? 0.9, ?0.8, ?0.8,
        0.9, ? ? 1, ?0.8, ?0.8 ,
        0.8, ? 0.8, ?? 1, ?0.9 ,
        0.8, ? 0.8, ?0.9, ? ? 1 )
dim(C) <- c(4,4)
C <- 2*sin(C*pi/6)  # a small correction
X <- rnorm(N*4); dim(X) <- c(N,4)
X <- X %*% chol(C)
U <- pnorm(X)
B <- qnbinom(U, size = 10, prob = 0.2)
[,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.9061185 0.8096429 0.8083266
[2,] 0.9061185 1.0000000 0.8098284 0.8087700
[3,] 0.8096429 0.8098284 1.0000000 0.9058436
[4,] 0.8083266 0.8087700 0.9058436 1.0000000
[,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.8993782 0.7988458 0.7983378
[2,] 0.8993782 1.0000000 0.7988966 0.7985780
[3,] 0.7988458 0.7988966 1.0000000 0.8993137
[4,] 0.7983378 0.7985780 0.8993137 1.0000000

Given the sample size here, I dare say one would never "spot the difference"
in smaller sample (for these parameters).

Regards,
Enrico