Consider the following:
> gcinfo(TRUE)
[1] FALSE
> rm(list=ls()); gc()
free total
Ncells 135837 250000
Vcells 747306 786432
------
> n <- 10000; p <- 20; X <- matrix(rnorm(n*p), n,p); gc()
Garbage collection [nr. 23]...
135839 cons cells free (54%)
4275 Kbytes of heap free (69%)
free total
Ncells 135829 250000
Vcells 547299 786432
------
which is roughly 200000 less than before;
i.e. Vcells are 8-byte sized,
1 Vcell = 1 double or 2 integers
Since we have 747 thousands of them ,
constructing X the double size (400'000) shouldn't be a problem ...
> rm(X); n <- 20000; p <- 20; X <- matrix(rnorm(n*p), n,p); gc()
Garbage collection [nr. 25]...
135823 cons cells free (54%)
2713 Kbytes of heap free (44%)
Error: heap memory (6144 Kb) exhausted [needed 3125 Kb more]
See "help(Memory)" on how to increase the heap size.
but it *is*, and it's
matrix' fault. (it constructs x, i.e. effectively doubles it's argument).
----
There seem to be worse problems when use
var(x)
and x is one of those huge n x p matrices...
--------
Of course there are problems like these to be optimized -- ``everywhere''.
Any proposals for a general approach?
Martin
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
dealing with large objects -- memory wasting ?
2 messages · Martin Maechler, Martyn Plummer
[Long example of matrix() wasting memory
by Martin Maechler (MM) snipped]
A minor point perhaps, but I think there is an error
in your calculations.
If I understand correctly, the problem is that matrix()
assigns a local copy of its answer before returning it.
So a version which does not do this ...
function (data = NA, nrow = 1, ncol = 1, byrow = FALSE)
{
if (missing(nrow))
nrow <- ceiling(length(data)/ncol)
else if (missing(ncol))
ncol <- ceiling(length(data)/nrow)
.Internal(matrix(data, nrow, ncol, byrow))
}
should do better
Using commands like
rm(X); n <- ... ; p <- 20; X <- matrix(rnorm(n*p), n,p); gc()
the largest value of n I could use successfully was about
18000, which still less than what you suggest,
MM> Since we have 747 thousands of them ,
MM> constructing X the double size (400'000) shouldn't be a problem ...
and only 50% greater than what you can do with the standard matrix()
function (n ~ 12000).
I think the answer is that your calculations did not take
into account the argument to matrix - rnorm(n*p) - which
also temporarily takes up as much memory as the final matrix.
With trivial data you can do better:
rm(X); n <- ... ; p <- 20; X <- matrix(0, n,p); gc()
You can assign up to n ~ 37000 with the standard matrix()
function and n ~ 74000 with the modified version, which
is the expected 100% improvement.
MM>There seem to be worse problems when use
MM>
MM> var(x)
MM>
MM>and x is one of those huge n x p matrices...
I couldn't assign a matrix that was big enough to crash var().
Is there a problem here? The fact that the default value of
y is x is not a problem because of lazy evaluation.
If you assigned y in the body of the function ...
function (x, y, na.rm = FALSE, use)
{
if (missing(y))
y <- x
if (missing(use))
use <- if (na.rm)
"complete.obs"
else "all.obs"
cov(x, y, use = use)
}
then you would have problems, but this isn't the case.
What am I missing?
Martyn
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._