Skip to content

R help

9 messages · Li Li, Peter Dalgaard, David Winsemius +3 more

#
On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
Hi.

The matrix of dimension m times m does not fit into memory,
since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB.

Generating a multivariate normal with a covariance matrix
with 1 on the diagonal and rho outside of the diagonal may
be done also as follows.

  m <- 10 # can be 5000000
  rho <- 0.5
  # single vector
  x <- rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho))

  # several vectors
  a <- t(replicate(10000, rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho))))

  # check the sample covariance matrix if m is not too large
  sigma <- cov(a)
  range(diag(sigma)) # elements on the diagonal

  [1] 0.9963445 1.0196015

  diag(sigma) <- NA 
  range(sigma, na.rm=TRUE) # elements outside of the diagonal

  [1] 0.4935129 0.5162836

Generating several vectors using replicate() may not be efficient.
The following can be used instead.

  n <- 10000
  a <- matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n, sd=sqrt(rho))

Note that the size of "a" is n times m and it should fit into the memory.

Hope this helps.

Petr Savicky.
#
On Feb 19, 2012, at 16:53 , li li wrote:

            
In the general case, there is really no alternative to an algorithm on the order of p^2 in space and p^2 per replication in time. The covariance matrix is of size p*(p+1)/2 and you will have to multiply by its square root for each replicate. If you run out of space, you are out of luck. Any potential speedups will have to rely on special structure of the covariance.

  
    
#
On Feb 19, 2012, at 11:46 AM, li li wrote:

            
No. You need to read the error message for meaning.
You don't have enough space in your machine (whatever it might be).

--  
David.
David Winsemius, MD
West Hartford, CT
#
On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
Hi.

I am replying to your first email, since the other did not arrive
to my folder, possibly filtered out by a spam filter. I see them
at the web interface.

1. Error: cannot allocate vector of size 381.5 Mb

The error message makes sense. The matrix requires m*n*8/2^20 MB,
which is in your case

  m <- 100000
  n <- 500
  m*n*8/2^20

  [1] 381.4697

May be, you already have other large objects in the memory. Try to
minimize the number and size of objects, which you need simultaneously
in an R session.

2. Generating a multivariate normal distribution.

As Peter Dalgaard pointed out, a speed up is possible only
for special types of the covariance matrix Sigma. A general
way is to find a matrix A such that A A^t = Sigma. Then,
the vector A X, where X is from N(0,I) and I is an identity
matrix of an appropriate dimension, has covariance Sigma.
This is also the way, how mvtnorm package works.

A speed up is possible, if computing the product A X does not
require to have matrix A explicitly represented in memory.

The matrix A need not be a square matrix. In particular, the
previous case may be understood as using the matrix A, which
for a small m is as follows.

  m <- 5
  rho <- 0.5
  A <- cbind(sqrt(rho), sqrt(1 - rho)*diag(m))
  A


            [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
  [1,] 0.7071068 0.7071068 0.0000000 0.0000000 0.0000000 0.0000000
  [2,] 0.7071068 0.0000000 0.7071068 0.0000000 0.0000000 0.0000000
  [3,] 0.7071068 0.0000000 0.0000000 0.7071068 0.0000000 0.0000000
  [4,] 0.7071068 0.0000000 0.0000000 0.0000000 0.7071068 0.0000000
  [5,] 0.7071068 0.0000000 0.0000000 0.0000000 0.0000000 0.7071068

  A %*% t(A)

       [,1] [,2] [,3] [,4] [,5]
  [1,]  1.0  0.5  0.5  0.5  0.5
  [2,]  0.5  1.0  0.5  0.5  0.5
  [3,]  0.5  0.5  1.0  0.5  0.5
  [4,]  0.5  0.5  0.5  1.0  0.5
  [5,]  0.5  0.5  0.5  0.5  1.0

This construction is conceptually possible also for a large m because
the structure of A allows to compute A X by simpler operations with
the vector X than an explicit matrix product. Namely, the expression

  rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho))

or, more clearly,

  sqrt(rho) * rnorm(1) + sqrt(1 - rho) * rnorm(m)

is equivalent to the required A X, where X consists of rnorm(1)
and rnorm(m) together.

If you have a specific Sigma, describe it and we can discuss,
whether an appropriate A can be found.

Hope this helps.

Petr Savicky.
#
Folks:

Perhaps I am naive, ignorant, or foolish, but this whole discussion
seems rather ridiculous. What possible relation to reality could a
multivariate normal of the size requested have? Even for simulation,
it seems like nonsense.

Cheers,
Bert
On Sun, Feb 19, 2012 at 11:35 AM, Petr Savicky <savicky at cs.cas.cz> wrote:

  
    
#
On 20/02/12 09:55, Bert Gunter wrote:
Right on, bro'!!!

     cheers,

         Rolf