Skip to content

dealing with large objects -- memory wasting ?

2 messages · Martin Maechler, Martyn Plummer

#
Consider the following:

    > gcinfo(TRUE)
    [1] FALSE
    > rm(list=ls()); gc()
	     free  total
    Ncells 135837 250000
    Vcells 747306 786432
	   ------
    > n <- 10000; p <- 20; X <- matrix(rnorm(n*p), n,p); gc()
    Garbage collection [nr. 23]...
    135839 cons cells free (54%)
    4275 Kbytes of heap free (69%)
	     free  total
    Ncells 135829 250000
    Vcells 547299 786432
	   ------
	   which is roughly 200000 less than before;
	   i.e. Vcells are 8-byte sized,
	   1 Vcell = 1 double or 2 integers

  Since we have 747 thousands of them , 
  constructing X the double size (400'000) shouldn't be a problem ...

    > rm(X); n <- 20000; p <- 20; X <- matrix(rnorm(n*p), n,p); gc()
    Garbage collection [nr. 25]...
    135823 cons cells free (54%)
    2713 Kbytes of heap free (44%)

    Error: heap memory (6144 Kb) exhausted [needed 3125 Kb more]
	   See "help(Memory)" on how to increase the heap size.

but it *is*, and it's
matrix' fault.  (it constructs  x, i.e. effectively doubles it's argument).

----

There seem to be worse problems when use 

      var(x)

and x is one of those huge  n x p  matrices...

--------
Of course there are problems like these to be optimized -- ``everywhere''.
Any proposals for a general approach?

Martin
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
[Long example of matrix() wasting memory
 by Martin Maechler (MM) snipped]

A minor point perhaps, but  I think there is an error
in your calculations.

If I understand correctly, the problem is that matrix()
assigns a local copy of its answer before returning it.
So  a version which does not do this ...

function (data = NA, nrow = 1, ncol = 1, byrow = FALSE) 
{
    if (missing(nrow)) 
        nrow <- ceiling(length(data)/ncol)
    else if (missing(ncol)) 
        ncol <- ceiling(length(data)/nrow)
    .Internal(matrix(data, nrow, ncol, byrow))
}

should do better 

Using commands like

rm(X); n <- ... ; p <- 20; X <- matrix(rnorm(n*p), n,p); gc()

the largest value of n I could use successfully was about
18000, which still less than what you suggest,

MM> Since we have 747 thousands of them , 
MM> constructing X the double size (400'000) shouldn't be a problem ...

and only 50% greater than what you can do with the standard matrix()
function (n ~ 12000).

I think the answer is that your calculations did not take
into account the argument to matrix - rnorm(n*p) - which
also temporarily takes up as much memory as the final matrix.

With trivial data you can do better:

rm(X); n <- ... ; p <- 20; X <- matrix(0, n,p); gc()

You can assign up to n ~ 37000 with the standard matrix()
function and n ~ 74000 with the modified version, which
is the expected 100% improvement.

MM>There seem to be worse problems when use 
MM>
MM>     var(x)
MM>
MM>and x is one of those huge  n x p  matrices...

I couldn't assign a matrix that was big enough to crash var().
Is there a problem here? The fact that the default value of 
y is x is not a problem because of lazy evaluation.
If you assigned y in the body of the function ...

function (x, y, na.rm = FALSE, use) 
{
    if (missing(y)) 
        y <- x
    if (missing(use)) 
        use <- if (na.rm) 
            "complete.obs"
        else "all.obs"
    cov(x, y, use = use)
}

then you would have problems, but this isn't the case.
What am I missing?

Martyn
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._