Problem when creating matrix of values based on covariance matrix
Sampling error? Do you realize how large a sample size you would need to precisely estimate an 8000 x 8000 covariance matrix? Probably exceeds the number of stars in our galaxy... Numerical issues may also play a role, but I am too ignorant on this aspect to offer advice. Finally, this is really not an R question, so you would probably do better to post on a stats site like stats.stackexchange.com rather than here. -- Bert
On Sat, Aug 11, 2012 at 7:17 AM, Boel Brynedal <brynedal at gmail.com> wrote:
Hi, I want to simulate a data set with similar covariance structure as my observed data, and have calculated a covariance matrix (dimensions 8368*8368). So far I've tried two approaches to simulating data: rmvnorm from the mvtnorm package, and by using the Cholesky decomposition (http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-normal-generation/). The problem is that the resulting covariance structure in my simulated data is very different from the original supplied covariance vector. Lets just look at some of the values:
cov8[1:4,1:4] # covariance of simulated data
X1 X2 X3 X4 X1 34515296.00 99956.69 369538.1 1749086.6 X2 99956.69 34515296.00 2145289.9 -624961.1 X3 369538.08 2145289.93 34515296.0 -163716.5 X4 1749086.62 -624961.09 -163716.5 34515296.0
CEUcovar[1:4,1:4]
[,1] [,2] [,3] [,4] [1,] 0.1873402987 0.001837229 0.0009009272 0.010324521 [2,] 0.0018372286 0.188665853 0.0124216535 -0.001755035 [3,] 0.0009009272 0.012421654 0.1867835412 -0.000142395 [4,] 0.0103245214 -0.001755035 -0.0001423950 0.192883488 So the distribution of the observed covariance is very narrow compared to the simulated data. None of the eigenvalues of the observed covariance matrix are negative, and it appears to be a positive definite matrix. Here is what I did to create the simulated data: Chol <- chol(CEUcovar) Z <- matrix(rnorm(20351 * 8368), 8368) X <- t(Chol) %*% Z sample8 <- data.frame(as.matrix(t(X)))
dim(sample8)
[1] 20351 8368 cov8=cov(sample8,method='spearman') [earlier I've also tried sample8 <- rmvnorm(1000, mean=rep(0,ncol(CEUcovar)), sigma=CEUcovar, method="eigen") with as 'bad' results, much larger covariance values in the simulated data ] Any ideas of WHY the simulated data have such a different covariance? Any experience with similar issues? Would be happy to supply the covariance matrix if anyone wants to give it a try. Any suggestions? Anything apparent that I left our or neglected? Any advice would be highly appreciated. Best, Bo
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm