Skip to content

.Call and to reclaim the memory by allocVector

5 messages · Yongchao Ge, Brian Ripley, Seth Falcon

#
Hi,

I am not sure if this is a bug and I apologize if it is something I 
didn't read carefully in the R extension manual. My initial search on the 
R help and R devel list archive didn't find useful information.

I am using .Call (as written in the R extension manual) for the C code 
and have found that the .Call didn't release the memory claimed by 
allocVector. Even after applying gc() function and removing the R object 
created by the .Call function, the memory was still not reclaimed back to 
the operating system.

Here is an example. It was modified from the convolve2 example from the R 
extension manual. Now I am computing the crossproduct of a and b, which 
returns a vector of size length(a)*length(b).

The C code is at the end of this message with the modification commented.
The R code is here
----------------------------
dyn.load("crossprod2.so")
cp <- function(a, b) .Call("crossprod2", a, b)
gctorture()
a<-1:10000
b<-1:1000
gc() #i

c<-cp(a,b)
rm(c)
gc() #ii
--------------

When I run the above code in a fresh start R (version 2.5.0)
the gc() inforamation is below. I report the last column ("max 
used (Mb)" ) here, which agrees the linux command "ps aux". Apparently 
even after I removing the object "c", we still have un-reclaimed 70M bytes 
of memory, which is approximately the memory size for the object "c".

If I run the command "c<-cp(a,b)" for three or four times and then remove the 
object "c" and apply gc() function, the unclaimed memory can reach 150M 
bytes. I tried gc(reset=TRUE), and it doesn't seem to make difference.

Can someone suggest what caused this problem and what the solution will 
be?  When you reply the email, please cc to me as I am not on the help 
list.

Thanks,

Yongchao

------------------------------------------------
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 173527  4.7     467875 12.5   350000  9.4
Vcells 108850  0.9     786432  6.0   398019  3.1
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 233998  6.3     467875 12.5   350000  9.4
Vcells 108866  0.9   12089861 92.3 10119856 77.3
-----------------------------------------------






--------------------------------------------
#include "R.h"
#include "Rinternals.h"
#include "Rdefines.h"
SEXP crossprod2(SEXP a, SEXP b);
//modified from convolve2 in the R extension
//R CMD SHLIB crossprod2.c

#include <R.h>
#include <Rinternals.h>
SEXP crossprod2(SEXP a, SEXP b)
{
      R_len_t i, j, na, nb, nab;
      double *xa, *xb, *xab;
      SEXP ab;

      PROTECT(a = coerceVector(a, REALSXP));
      PROTECT(b = coerceVector(b, REALSXP));
      na = length(a); nb = length(b);

      //nab = na + nb - 1;
      nab=na*nb;// we are doing the cross product
      PROTECT(ab = allocVector(REALSXP, nab));
      xa = REAL(a); xb = REAL(b);
      xab = REAL(ab);
      for(i = 0; i < nab; i++) xab[i] = 0.0;
      for(i = 0; i < na; i++)
 	  for(j = 0; j < nb; j++) //xab[i + j] += xa[i] * xb[j];
 	       xab[i*nb + j] += xa[i] * xb[j];//we are computing crossproduct
      UNPROTECT(3);
      return(ab);
}
#
Please do not post to multiple lists! I've removed R-help.

You have not told us your OS ('linux', perhaps but what CPU), nor how you 
know 'the memory was still not reclaimed back to the operating system'. 
But that is how many OSes work: their malloc maintains a pool of memory 
pages, and free() does not return the memory to the OS kernel, just to the 
process' pool.  It depends on what you meant by 'the operating system'.

Why does this bother you?  150Mb of virtual memory is nothing these days.
On Thu, 23 Aug 2007, Yongchao Ge wrote:

            
Exactly this topic was thrashed to death under the misleading title of 
'Suspected memory leak' earlier this month in a thread that started on 
R-help and moved to R-devel. See e.g.

https://stat.ethz.ch/pipermail/r-devel/2007-August/046669.html

from the author of the R memory allocator.

  
    
#
Dear Prof. Ripley

I am using 32bit Ubuntu 7.04 on Dual Core Intel Xeon Processor 5140. I do 
not think that it is the OS's problem in recognizing the memory released by 
free(), as the Calloc() and Free() pair works perfectly well in my 
C program. I'm assuming that the free() in your post does not mean
the standard C library function, but the Free() in the R extension, as 
recommended to release the memory back to the OS by the R extension 
manual.

It was not the 150MB that bothers me. I used the toy example to 
isolate the problem. My actual program needs to allocate around 660M bytes 
(maybe more, depending on the actual dataset) for a return from .Call. 
This return object is stored in R and will be used by many other 
functions, which also uses .Call to wrap the C code. I found that my 
program reaches the memory limit (3G) very quickly even though at most 
1.8G bytes of data should be in the memory in the C and R codes combined 
(potentially two copies of the same R object and a copy in the C 
program). The memory problem in .Call means that my program 
can run once or twice, and it fails the third time. I need to run the 
same program more than twice.

Why am I storing a large dataset in the R? My program consist of two 
parts. The first part is to get the intermediate results, the computation 
of which takes a lot of time. The second part contains many 
different functions to manipulate the the intermediate 
results.

My current solution is to save intermediate result in a temporary file, 
but my final goal is to to save it as an R object. The "memory leak" in 
.Call stops me from doing this and I'd like to know if I can have a clean 
solution for the R package I am writing.

Yongchao
On Fri, 24 Aug 2007, Prof Brian Ripley wrote:

            
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Yongchao Ge                                  Yongchao.Ge at mssm.edu
Mount Sinai School of Medicine               office: 212-241-3536
Department of Neurology
One Gustave L. Levy Place, Box 1137     New York, NY, 10029, USA
web url: www.mssm.edu/faculty/yongchao-ge
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 day later
#
Hi Yongchao,

Yongchao Ge <Yongchao.Ge at mssm.edu> writes:
There are many examples of packages that use .Call to create large
objects.  I don't think there is a "memory leak".

One thing that may be catching you up is that because of R's
pass-by-value semantics, you may be ending up with multiple copies of
the object on the R side during some of your operations.  I would
recommend recompiling with --enable-memory-profiling and using
tracemem() to see if you can identify places where copies of your
large object are occurring.  You can also take a look at
Rprof(memory.profile=TRUE).

+ seth
3 days later
#
Hi Seth,

Thank you for the suggestion. Because of using .Call (which does not copy 
the value) for both parts of my program, there is no extra copy shown by 
tracemem(). Anyway, the information shown by gc() is very misleading as 
stated by Prof. Ripley, especially after creating and removing a 
couple of large R datasets and applying the function gc() a couple of times.

As shown by "ps aux", there is no "memory leak" from .Call. It's a big 
relief to me. Mysteriously, my program works now for storing the 
intermediate results as a 660M R object. I can run the same function  as 
often as I want. The maximum space taken by the program has never 
exceeded 1.8G as I expected. The disappearance of taking too much memory 
from .Call may be due to a recompile of my C code or a restart of the 
linux or a fresh mind after the weekend.

Thank you and Prof. Ripley for the suggestions. It helps me to stay 
focused.


Yongchao
On Sat, 25 Aug 2007, Seth Falcon wrote: