An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120218/1e6a1ccb/attachment.pl>
R help
9 messages · Li Li, Peter Dalgaard, David Winsemius +3 more
On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
Dear all,
I need to generate numbers from multivariate normal with large dimensions
(5,000,000).
Below is my code and the error I got from R. Sigma in the code is the
covariance
matrix. Can anyone give some idea on how to take care of this error. Thank
you.
Hannah
m <- 5000000 m1 <- 0.5*m rho <- 0.5 Sigma <- rho* matrix(1, m, m)+diag(1-rho, m)
Error in matrix(1, m, m) : too many elements specified
Hi. The matrix of dimension m times m does not fit into memory, since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB. Generating a multivariate normal with a covariance matrix with 1 on the diagonal and rho outside of the diagonal may be done also as follows. m <- 10 # can be 5000000 rho <- 0.5 # single vector x <- rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)) # several vectors a <- t(replicate(10000, rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)))) # check the sample covariance matrix if m is not too large sigma <- cov(a) range(diag(sigma)) # elements on the diagonal [1] 0.9963445 1.0196015 diag(sigma) <- NA range(sigma, na.rm=TRUE) # elements outside of the diagonal [1] 0.4935129 0.5162836 Generating several vectors using replicate() may not be efficient. The following can be used instead. n <- 10000 a <- matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n, sd=sqrt(rho)) Note that the size of "a" is n times m and it should fit into the memory. Hope this helps. Petr Savicky.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120219/dff1779d/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120219/b35e3937/attachment.pl>
On Feb 19, 2012, at 16:53 , li li wrote:
Petr, Thanks for the help. That certainly makes sense and solves my current problem. But, in general, for arbitrary covariance matrix (instead of the special equi-correlation case), it there a way to generate numbers from multivariate normal when the dimension is very high?
In the general case, there is really no alternative to an algorithm on the order of p^2 in space and p^2 per replication in time. The covariance matrix is of size p*(p+1)/2 and you will have to multiply by its square root for each replicate. If you run out of space, you are out of luck. Any potential speedups will have to rely on special structure of the covariance.
Thanks.
Hannah
?? 2012??2??19?? ????5:01??Petr Savicky <savicky at cs.cas.cz>??????
On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
Dear all, I need to generate numbers from multivariate normal with large
dimensions
(5,000,000). Below is my code and the error I got from R. Sigma in the code is the covariance matrix. Can anyone give some idea on how to take care of this error.
Thank
you.
Hannah
m <- 5000000 m1 <- 0.5*m rho <- 0.5 Sigma <- rho* matrix(1, m, m)+diag(1-rho, m)
Error in matrix(1, m, m) : too many elements specified
Hi. The matrix of dimension m times m does not fit into memory, since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB. Generating a multivariate normal with a covariance matrix with 1 on the diagonal and rho outside of the diagonal may be done also as follows. m <- 10 # can be 5000000 rho <- 0.5 # single vector x <- rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)) # several vectors a <- t(replicate(10000, rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)))) # check the sample covariance matrix if m is not too large sigma <- cov(a) range(diag(sigma)) # elements on the diagonal [1] 0.9963445 1.0196015 diag(sigma) <- NA range(sigma, na.rm=TRUE) # elements outside of the diagonal [1] 0.4935129 0.5162836 Generating several vectors using replicate() may not be efficient. The following can be used instead. n <- 10000 a <- matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n, sd=sqrt(rho)) Note that the size of "a" is n times m and it should fit into the memory. Hope this helps. Petr Savicky.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
On Feb 19, 2012, at 11:46 AM, li li wrote:
Actually I still get an error for the case of equal correlation.
No. You need to read the error message for meaning.
Below is the code
m <- 100000 n <- 500 m1 <- 0.5*m mu <- c(rep(2, m1), rep(0, m-m1)) rho <- 0.5 x.mod1 <- matrix(rnorm(n*m, sd=sqrt(1-rho)), nrow=n, ncol=m)+rnorm(n,
sd=sqrt(rho))+t(replicate(n,mu)) Error: cannot allocate vector of size 381.5 Mb
You don't have enough space in your machine (whatever it might be). -- David.
R(173,0xa0a7b540) malloc: *** mmap(size=400003072) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug R(173,0xa0a7b540) malloc: *** mmap(size=400003072) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug R(173,0xa0a7b540) malloc: *** mmap(size=400003072) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug ?? 2012??2??19?? ????10:53??li li <hannah.hlx at gmail.com>??????
Petr,
Thanks for the help. That certainly makes sense and solves my
current
problem. But, in general, for arbitrary covariance matrix (instead
of the
special equi-correlation case), it there a way to generate numbers
from
multivariate normal when the dimension is very high?
Thanks.
Hannah
?? 2012??2??19?? ????5:01??Petr Savicky <savicky at cs.cas.cz>??????
On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
Dear all, I need to generate numbers from multivariate normal with large
dimensions
(5,000,000). Below is my code and the error I got from R. Sigma in the code is the covariance matrix. Can anyone give some idea on how to take care of this error.
Thank
you.
Hannah
m <- 5000000 m1 <- 0.5*m rho <- 0.5 Sigma <- rho* matrix(1, m, m)+diag(1-rho, m)
Error in matrix(1, m, m) : too many elements specified
Hi. The matrix of dimension m times m does not fit into memory, since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB. Generating a multivariate normal with a covariance matrix with 1 on the diagonal and rho outside of the diagonal may be done also as follows. m <- 10 # can be 5000000 rho <- 0.5 # single vector x <- rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)) # several vectors a <- t(replicate(10000, rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)))) # check the sample covariance matrix if m is not too large sigma <- cov(a) range(diag(sigma)) # elements on the diagonal [1] 0.9963445 1.0196015 diag(sigma) <- NA range(sigma, na.rm=TRUE) # elements outside of the diagonal [1] 0.4935129 0.5162836 Generating several vectors using replicate() may not be efficient. The following can be used instead. n <- 10000 a <- matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n, sd=sqrt(rho)) Note that the size of "a" is n times m and it should fit into the memory. Hope this helps. Petr Savicky.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD West Hartford, CT
On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
Dear all, I need to generate numbers from multivariate normal with large dimensions (5,000,000).
Hi.
I am replying to your first email, since the other did not arrive
to my folder, possibly filtered out by a spam filter. I see them
at the web interface.
1. Error: cannot allocate vector of size 381.5 Mb
The error message makes sense. The matrix requires m*n*8/2^20 MB,
which is in your case
m <- 100000
n <- 500
m*n*8/2^20
[1] 381.4697
May be, you already have other large objects in the memory. Try to
minimize the number and size of objects, which you need simultaneously
in an R session.
2. Generating a multivariate normal distribution.
As Peter Dalgaard pointed out, a speed up is possible only
for special types of the covariance matrix Sigma. A general
way is to find a matrix A such that A A^t = Sigma. Then,
the vector A X, where X is from N(0,I) and I is an identity
matrix of an appropriate dimension, has covariance Sigma.
This is also the way, how mvtnorm package works.
A speed up is possible, if computing the product A X does not
require to have matrix A explicitly represented in memory.
The matrix A need not be a square matrix. In particular, the
previous case may be understood as using the matrix A, which
for a small m is as follows.
m <- 5
rho <- 0.5
A <- cbind(sqrt(rho), sqrt(1 - rho)*diag(m))
A
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.7071068 0.7071068 0.0000000 0.0000000 0.0000000 0.0000000
[2,] 0.7071068 0.0000000 0.7071068 0.0000000 0.0000000 0.0000000
[3,] 0.7071068 0.0000000 0.0000000 0.7071068 0.0000000 0.0000000
[4,] 0.7071068 0.0000000 0.0000000 0.0000000 0.7071068 0.0000000
[5,] 0.7071068 0.0000000 0.0000000 0.0000000 0.0000000 0.7071068
A %*% t(A)
[,1] [,2] [,3] [,4] [,5]
[1,] 1.0 0.5 0.5 0.5 0.5
[2,] 0.5 1.0 0.5 0.5 0.5
[3,] 0.5 0.5 1.0 0.5 0.5
[4,] 0.5 0.5 0.5 1.0 0.5
[5,] 0.5 0.5 0.5 0.5 1.0
This construction is conceptually possible also for a large m because
the structure of A allows to compute A X by simpler operations with
the vector X than an explicit matrix product. Namely, the expression
rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho))
or, more clearly,
sqrt(rho) * rnorm(1) + sqrt(1 - rho) * rnorm(m)
is equivalent to the required A X, where X consists of rnorm(1)
and rnorm(m) together.
If you have a specific Sigma, describe it and we can discuss,
whether an appropriate A can be found.
Hope this helps.
Petr Savicky.
Folks: Perhaps I am naive, ignorant, or foolish, but this whole discussion seems rather ridiculous. What possible relation to reality could a multivariate normal of the size requested have? Even for simulation, it seems like nonsense. Cheers, Bert
On Sun, Feb 19, 2012 at 11:35 AM, Petr Savicky <savicky at cs.cas.cz> wrote:
On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
Dear all, ? I need to generate numbers from multivariate normal with large dimensions (5,000,000).
Hi. I am replying to your first email, since the other did not arrive to my folder, possibly filtered out by a spam filter. I see them at the web interface. 1. Error: cannot allocate vector of size 381.5 Mb The error message makes sense. The matrix requires m*n*8/2^20 MB, which is in your case ?m <- 100000 ?n <- 500 ?m*n*8/2^20 ?[1] 381.4697 May be, you already have other large objects in the memory. Try to minimize the number and size of objects, which you need simultaneously in an R session. 2. Generating a multivariate normal distribution. As Peter Dalgaard pointed out, a speed up is possible only for special types of the covariance matrix Sigma. A general way is to find a matrix A such that A A^t = Sigma. Then, the vector A X, where X is from N(0,I) and I is an identity matrix of an appropriate dimension, has covariance Sigma. This is also the way, how mvtnorm package works. A speed up is possible, if computing the product A X does not require to have matrix A explicitly represented in memory. The matrix A need not be a square matrix. In particular, the previous case may be understood as using the matrix A, which for a small m is as follows. ?m <- 5 ?rho <- 0.5 ?A <- cbind(sqrt(rho), sqrt(1 - rho)*diag(m)) ?A ? ? ? ? ? ?[,1] ? ? ?[,2] ? ? ?[,3] ? ? ?[,4] ? ? ?[,5] ? ? ?[,6] ?[1,] 0.7071068 0.7071068 0.0000000 0.0000000 0.0000000 0.0000000 ?[2,] 0.7071068 0.0000000 0.7071068 0.0000000 0.0000000 0.0000000 ?[3,] 0.7071068 0.0000000 0.0000000 0.7071068 0.0000000 0.0000000 ?[4,] 0.7071068 0.0000000 0.0000000 0.0000000 0.7071068 0.0000000 ?[5,] 0.7071068 0.0000000 0.0000000 0.0000000 0.0000000 0.7071068 ?A %*% t(A) ? ? ? [,1] [,2] [,3] [,4] [,5] ?[1,] ?1.0 ?0.5 ?0.5 ?0.5 ?0.5 ?[2,] ?0.5 ?1.0 ?0.5 ?0.5 ?0.5 ?[3,] ?0.5 ?0.5 ?1.0 ?0.5 ?0.5 ?[4,] ?0.5 ?0.5 ?0.5 ?1.0 ?0.5 ?[5,] ?0.5 ?0.5 ?0.5 ?0.5 ?1.0 This construction is conceptually possible also for a large m because the structure of A allows to compute A X by simpler operations with the vector X than an explicit matrix product. Namely, the expression ?rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)) or, more clearly, ?sqrt(rho) * rnorm(1) + sqrt(1 - rho) * rnorm(m) is equivalent to the required A X, where X consists of rnorm(1) and rnorm(m) together. If you have a specific Sigma, describe it and we can discuss, whether an appropriate A can be found. Hope this helps. Petr Savicky.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
On 20/02/12 09:55, Bert Gunter wrote:
Folks: Perhaps I am naive, ignorant, or foolish, but this whole discussion seems rather ridiculous. What possible relation to reality could a multivariate normal of the size requested have? Even for simulation, it seems like nonsense.
Right on, bro'!!!
cheers,
Rolf