allocVector bug ?
On Wed, 1 Nov 2006, Vladimir Dergachev wrote:
Hi all,
I was looking at the following piece of code in src/main/memory.c, function
allocVector :
if (size <= NodeClassSize[1]) {
node_class = 1;
alloc_size = NodeClassSize[1];
}
else {
node_class = LARGE_NODE_CLASS;
alloc_size = size;
for (i = 2; i < NUM_SMALL_NODE_CLASSES; i++) {
if (size <= NodeClassSize[i]) {
node_class = i;
alloc_size = NodeClassSize[i];
break;
}
}
}
It appears that for LARGE_NODE_CLASS the variable alloc_size should not be
size, but something far less as we are not using vector heap, but rather
calling malloc directly in the code below (and from discussions I read on
this mailing list I think that these two are different - please let me know
if I am wrong).
So when allocate a large vector the garbage collector goes nuts trying to find
all that space which is not going to be needed after all.
This is as intended, not a bug. The garbage collector does not "go nuts" -- it is doing a garbage collection that may release memory in advance of making a large allocation. The size of the current allocation request is used as part of the process of deciding when to satisfy an allocation by malloc (of a single large noda or a page) and when to first do a gc. It is essential to do this for large allocations as well to keep the memory footprint down and help reduce fragmentation. The strategy for deciding when to allocate and when to gc is by necessity heuristic. It tries to keep overall memory footprint low but at the same time tries to adapt to usage so that gc happens less oftn once a pattern of using larger amounts of memory emerges. The current strategy seems quite robust across a large range of architactures, memory configurations, and applications. That said, when I wrote the mamager I kept in mind that we might eventually want to try morre sophisticated schemes and/or allow some user control over the schemes used. It may be time to revisit this soon. luke
I made an experiment and replaced the line alloc_size=size with alloc_size=0.
R compiled fine (both 2.4.0 and 2.3.1) and passed make check with no issues
(it all printed OK).
Furthermore, all allocVector calls completed in no time and my test case run
very fast (22 seconds, as opposed to minutes).
In addition, attach() was instantaneous which was wonderful.
Could anyone with deeper knowledge of R internals comment on whether this
makes any sense ?
thank you very much !
Vladimir Dergachev
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu