Dear R User, I am wondering if there is a way to generate correlated multivariate non-normal distribution? For example, I want to generate four correlated negative binomial series with parameters r=10, p=0.2, based on the correlation coefficient matrix | 1 0.9 0.8 0.8 | | 0.9 1 0.8 0.8 | | 0.8 0.8 1 0.9 | | 0.8 0.8 0.9 1 | Thank a lot. Best, Yue Yu
Correlated Multivariate Distribution Generator
5 messages · Yue Yu, Enrico Schumann, Doran, Harold
Hi Yue Yu, similar questions have been discussed several times on this list; you may want to search the archives. Do you mean linear correlation, or rank correlation? In general, a given linear correlation will not always be attainable (though in your case, it probably will). If _rank_ correlation is fine, you could do something like this: (1) create Gaussians X with a given linear correlation (which is almost identical to rank correlation in the Gaussian case, but you can even correct this*) (2) put X into the distribution function of the normal: you get uniforms U. Since the distribution function is monotone, the rank correlation will remain unaltered. Hence the U are (rank-)correlated just as the X. (3) put U into the inverse of the desired distribution function. The inverse is mononote as well, so the rank correlation remains the same. [This approach will only be practical if you can compute the inverse reasonably fast.] Regards, Enrico * see, eg, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1681917
-----Urspr?ngliche Nachricht----- Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von Yue Yu Gesendet: Mittwoch, 27. Juli 2011 05:01 An: r-help at r-project.org Betreff: [R] Correlated Multivariate Distribution Generator Dear R User, I am wondering if there is a way to generate correlated multivariate non-normal distribution? For example, I want to generate four correlated negative binomial series with parameters r=10, p=0.2, based on the correlation coefficient matrix | 1 0.9 0.8 0.8 | | 0.9 1 0.8 0.8 | | 0.8 0.8 1 0.9 | | 0.8 0.8 0.9 1 | Thank a lot. Best, Yue Yu
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Do you mean something like this?
cors <- matrix(c(1, .9, .8, .8, .9, 1, .8, .8, .8, .8, 1, .9, .8, .8, .9, 1), 4) L <- chol(cors) N <- 1000 dat <- cbind(rnbinom(N, mu = 4, size = 1), rnbinom(N, mu = 4, size = 1), rnbinom(N, mu = 4, size = 1), rnbinom(N, mu = 4, size = 1)) result <- dat %*% L cor(dat)
[,1] [,2] [,3] [,4] [1,] 1.00000000 0.04227103 0.00358846 0.02860722 [2,] 0.04227103 1.00000000 -0.03122457 0.03070221 [3,] 0.00358846 -0.03122457 1.00000000 0.04621639 [4,] 0.02860722 0.03070221 0.04621639 1.00000000
cor(result)
[,1] [,2] [,3] [,4] [1,] 1.0000000 0.9021538 0.8183199 0.8158886 [2,] 0.9021538 1.0000000 0.8114415 0.8152286 [3,] 0.8183199 0.8114415 1.0000000 0.9145778 [4,] 0.8158886 0.8152286 0.9145778 1.0000000
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Yue Yu Sent: Tuesday, July 26, 2011 11:01 PM To: r-help at r-project.org Subject: [R] Correlated Multivariate Distribution Generator Dear R User, I am wondering if there is a way to generate correlated multivariate non-normal distribution? For example, I want to generate four correlated negative binomial series with parameters r=10, p=0.2, based on the correlation coefficient matrix | 1 0.9 0.8 0.8 | | 0.9 1 0.8 0.8 | | 0.8 0.8 1 0.9 | | 0.8 0.8 0.9 1 | Thank a lot. Best, Yue Yu
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks, Enrico and Harold. I have searched the archives for this topic, but all I can find is about the multivariate normal or uniform generator, which doesn't help in my case. Harold's transformation is good for getting the correlation, but the negative binomial distribution will be changed after transformation, won't keep the desired parameters r and p. Enrico's method is brilliant for rank correlation, is there any algorithm for linear correlation? Even some references for multivariate negative binomial generator will help. Your help are really appreciated Best, Yue Yu
On Wed, Jul 27, 2011 at 11:31, Doran, Harold <HDoran at air.org> wrote:
Do you mean something like this?
cors <- matrix(c(1, .9, .8, .8, .9, 1, .8, .8, .8, .8, 1, .9, .8, .8, .9, 1), 4) L <- chol(cors) N <- 1000 dat <- cbind(rnbinom(N, mu = 4, size = 1), rnbinom(N, mu = 4, size = 1), rnbinom(N, mu = 4, size = 1), rnbinom(N, mu = 4, size = 1)) result <- dat %*% L cor(dat)
? ? ? ? ? [,1] ? ? ? ?[,2] ? ? ? ?[,3] ? ? ? [,4] [1,] 1.00000000 ?0.04227103 ?0.00358846 0.02860722 [2,] 0.04227103 ?1.00000000 -0.03122457 0.03070221 [3,] 0.00358846 -0.03122457 ?1.00000000 0.04621639 [4,] 0.02860722 ?0.03070221 ?0.04621639 1.00000000
cor(result)
? ? ? ? ?[,1] ? ? ?[,2] ? ? ?[,3] ? ? ?[,4] [1,] 1.0000000 0.9021538 0.8183199 0.8158886 [2,] 0.9021538 1.0000000 0.8114415 0.8152286 [3,] 0.8183199 0.8114415 1.0000000 0.9145778 [4,] 0.8158886 0.8152286 0.9145778 1.0000000
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Yue Yu Sent: Tuesday, July 26, 2011 11:01 PM To: r-help at r-project.org Subject: [R] Correlated Multivariate Distribution Generator Dear R User, I am wondering if there is a way to generate correlated multivariate non-normal distribution? For example, I want to generate four correlated negative binomial series with parameters r=10, p=0.2, based on the correlation coefficient matrix | ? 1 ? 0.9 ?0.8 ?0.8 | | 0.9 ? ? 1 ?0.8 ?0.8 | | 0.8 ?0.8 ? ? 1 ?0.9 | | 0.8 ?0.8 ?0.9 ? ? 1 | Thank a lot. Best, Yue Yu
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Just a pointer, not a solution: if you use the method I described, it all
comes down to the initial correlation matrix that you specify for the
Gaussian case. If you had the "correct" initial correlation matrix, it would
lead to the desired linear correlation in your binomials. (But it is not
guaranteed that such a "correct" matrix even exists.)
You could now try to adjust the initial matrix such that the obtained linear
correlation in your final variates is as desired. There are a number of
techniques, but I cannot judge for I have never used any of them. There is,
eg, an approach called NORTA (stands for "normals to anything", or something
like that); you way want to look for that.
But to use your specific example:
N <- 200000
C <- c( ? 1, ? 0.9, ?0.8, ?0.8,
0.9, ? ? 1, ?0.8, ?0.8 ,
0.8, ? 0.8, ?? 1, ?0.9 ,
0.8, ? 0.8, ?0.9, ? ? 1 )
dim(C) <- c(4,4)
C <- 2*sin(C*pi/6) # a small correction
X <- rnorm(N*4); dim(X) <- c(N,4)
X <- X %*% chol(C)
U <- pnorm(X)
B <- qnbinom(U, size = 10, prob = 0.2)
cor(B)
[,1] [,2] [,3] [,4] [1,] 1.0000000 0.9061185 0.8096429 0.8083266 [2,] 0.9061185 1.0000000 0.8098284 0.8087700 [3,] 0.8096429 0.8098284 1.0000000 0.9058436 [4,] 0.8083266 0.8087700 0.9058436 1.0000000
cor(B, method = "spearman")
[,1] [,2] [,3] [,4] [1,] 1.0000000 0.8993782 0.7988458 0.7983378 [2,] 0.8993782 1.0000000 0.7988966 0.7985780 [3,] 0.7988458 0.7988966 1.0000000 0.8993137 [4,] 0.7983378 0.7985780 0.8993137 1.0000000 Given the sample size here, I dare say one would never "spot the difference" in smaller sample (for these parameters). Regards, Enrico
-----Urspr?ngliche Nachricht----- Von: Yue Yu [mailto:parn.yy at gmail.com] Gesendet: Mittwoch, 27. Juli 2011 21:18 An: enricoschumann at yahoo.de; Doran, Harold Cc: r-help at r-project.org Betreff: Re: [R] Correlated Multivariate Distribution Generator Thanks, Enrico and Harold. I have searched the archives for this topic, but all I can find is about the multivariate normal or uniform generator, which doesn't help in my case. Harold's transformation is good for getting the correlation, but the negative binomial distribution will be changed after transformation, won't keep the desired parameters r and p. Enrico's method is brilliant for rank correlation, is there any algorithm for linear correlation? Even some references for multivariate negative binomial generator will help. Your help are really appreciated Best, Yue Yu On Wed, Jul 27, 2011 at 11:31, Doran, Harold <HDoran at air.org> wrote:
Do you mean something like this?
cors <- matrix(c(1, .9, .8, .8, .9, 1, .8, .8, .8, .8, 1,
.9, .8, .8,
.9, 1), 4) L <- chol(cors) N <- 1000 dat <-
cbind(rnbinom(N, mu = 4,
size = 1), rnbinom(N, mu = 4, size = 1), rnbinom(N, mu = 4, size = 1), rnbinom(N, mu = 4, size = 1)) result <- dat %*% L cor(dat)
? ? ? ? ? [,1] ? ? ? ?[,2] ? ? ? ?[,3] ? ? ? [,4] [1,] 1.00000000 ? 0.04227103 ?0.00358846 0.02860722 [2,] 0.04227103 ?1.00000000 -0.03122457 0.03070221 [3,] 0.00358846 -0.03122457 ?1.00000000 0.04621639 [4,] 0.02860722 ?0.03070221 ?0.04621639 1.00000000
cor(result)
? ? ? ? ?[,1] ? ? ?[,2] ? ? ?[,3] ? ? ?[,4] [1,] 1.0000000
0.9021538
0.8183199 0.8158886 [2,] 0.9021538 1.0000000 0.8114415
0.8152286 [3,]
0.8183199 0.8114415 1.0000000 0.9145778 [4,] 0.8158886 0.8152286 0.9145778 1.0000000
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Yue Yu Sent: Tuesday, July 26, 2011 11:01 PM To: r-help at r-project.org Subject: [R] Correlated Multivariate Distribution Generator Dear R User, I am wondering if there is a way to generate correlated
multivariate
non-normal distribution? For example, I want to generate four correlated negative binomial series with parameters r=10, p=0.2, based on the correlation coefficient matrix | ? 1 ? 0.9 ?0.8 ?0.8 | | 0.9 ? ? 1 ?0.8 ?0.8 | | 0.8 ?0.8 ? ? 1 ?0.9 | | 0.8 ?0.8 ?0.9 ? ? 1 | Thank a lot. Best, Yue Yu
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.