Skip to content

growing process size in simulation

10 messages · Achim Zeileis, Peter Dalgaard, Luke Tierney +3 more

#
I came across this in a simulation I ran under 1.6.0: If I do something
like

R> x <- rnorm(10)
R> rval <- NULL
R> for(i in 1:100000) rval <- t.test(x)$p.value

then the process size remains at about 14M under 1.5.1, but it seems to
be almost linearly growing up to more than 100M under 1.6.0.

I know that the above simulation is nonsense, but it was the simplest I
could come up with to reproduce the behaviour. It doesn't depend on
t.test, if I use wilcox.test(x)$p.value the same happens...

I could reproduce this behaviour under Linux and Solaris, the exact
versions are given below.
Z

---
the problem exists on

platform i686-pc-linux-gnu
arch     i686             
os       linux-gnu        
system   i686, linux-gnu  
status                    
major    1                
minor    6.0              
year     2002             
month    10               
day      01               
language R                

and

platform sparc-sun-solaris2.7
arch     sparc               
os       solaris2.7          
system   sparc, solaris2.7   
status                       
major    1                   
minor    6.0                 
year     2002                
month    10                  
day      01                  
language R                   

but not on

platform i386-pc-linux-gnu
arch     i386             
os       linux-gnu        
system   i386, linux-gnu  
status                    
major    1                
minor    5.1              
year     2002             
month    06               
day      17               
language R
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Achim Zeileis <zeileis@ci.tuwien.ac.at> writes:
Argh. Confirmed. One interesting clue is that R itself doesn't seem to
know about this:

for(i in 1:50000) {
   rval <- t.test(x)$p.value 
   if (i %% 10000 == 0) print(gc())
}

         used (Mb) gc trigger (Mb)
Ncells 208343  5.6     407500 10.9
Vcells  64656  0.5     786432  6.0
         used (Mb) gc trigger (Mb)
Ncells 208343  5.6     407500 10.9
Vcells  64656  0.5     786432  6.0
         used (Mb) gc trigger (Mb)
Ncells 208343  5.6     407500 10.9
Vcells  64656  0.5     786432  6.0
         used (Mb) gc trigger (Mb)
Ncells 208343  5.6     407500 10.9
Vcells  64656  0.5     786432  6.0
         used (Mb) gc trigger (Mb)
Ncells 208343  5.6     407500 10.9
Vcells  64656  0.5     786432  6.0

..but the memory footprint is still growing.
#
On 11 Oct 2002, Peter Dalgaard BSA wrote:

            
It looks to me like something in deparse (which gets called in
t.test.default) may be the culprit:

   for(i in 1:100000) rval <- deparse("x")

exhibits the same behavior.

In running under gdb it looks like deparse("x") results in the call
sequence

deparse1WithCutoff->deparse2->deparse2buff->print2buff->R_AllocStringBuffer

and malloc is called in R_AllocStringBuffer to create the buffer.  The
allocation is stored into a local structure variable in
deparse1WithCutoff and I think is not being free'd before
deparse1WithCutoff exits.

luke
#
Luke Tierney <luke@stat.uiowa.edu> writes:
Thanks for looking into this, Luke. Yes, I agree. Specifically, the
issue seems to be that we used to work off a static variable "buff"
and use realloc on that inside AllocBuffer if it was non-null. Now
we've put the variable inside a structure "buf->data" and still trying
to use the realloc'ing construct, but since "buf" (aka localData) is a
local variable, it gets NULL'ed every time deparse1WithCutoff() is
called ---> deparse1WithCutoff needs to clean up on the way out.
#
On 12 Oct 2002, Peter Dalgaard BSA wrote:

            
Right.  Unfortunately the issue goes beyond deparse.c.
R_AllocStringBuffer is also called in printutils.c, saveload.c,
scan.c.  I'm not sure about printutils.c, but the others look like
they have the same issue.  R_LoadFromFile in saveload.c looks like a
particular pain to fix; perhaps moving the buffer allocation up one
call level would work.  I can try to fix this later next week unless
someone else gets there first.

I wonder if there is something we could add to the QA tools that could
have picked this up.

luke
#
Luke Tierney <luke@stat.uiowa.edu> writes:
I've fixed the one in deparse.c now. I had a suspicion that there
might be other cases, so I tried to abstract it in the same spirit as
R_AllocStringBuffer. There's now an R_FreeStringBuffer, and I think a
straightforward rule for invoking it: The function that allocates the
pointer for the storage must call R_FreeStringBuffer before returning.

R_LoadFromFile has more than a dozen exit points, so you get to say
"R_FreeStringBuffer(data.buffer);" quite a few times, but I don't
think the pain extends beyond that.
1 day later
#
On Sat, 12 Oct 2002, Luke Tierney wrote:

            
valgrind will do this (under Linux only).  I just tried it and got

==6195== 151040 bytes in 295 blocks are definitely lost in loss record 49
of 50
==6195==    at 0x400434EB: malloc (vg_clientfuncs.c:100)
==6195==    by 0x8091BB0: R_AllocStringBuffer (deparse.c:150)
==6195==    by 0x8093501: print2buff (deparse.c:940)
==6195==    by 0x8092760: deparse2buff (deparse.c:560)

but
==6195== 12341268 bytes in 6183 blocks are still reachable in loss record
50 of 50
==6195==    at 0x400434EB: malloc (vg_clientfuncs.c:100)
==6195==    by 0x80D0131: GetNewPage (memory.c:552)
==6195==    by 0x80D3188: Rf_allocVector (memory.c:1731)
==6195==    by 0x80D2ECB: Rf_allocString (memory.c:1622)
==6195==

showing that it can distinguish memory leaks from memory that just hasn't
been freed.

	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Reference to valgrind, please?
On Mon, 14 Oct 2002, Thomas Lumley wrote:

            

  
    
#
http://developer.kde.org/~sewardj/

It is an x86/Linux only x86 to x86 emulator which can do memory checking, 
cache profiling, etc.


David Bauer

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Mon, 14 Oct 2002 ripley@stats.ox.ac.uk wrote:

            
http://developer.kde.org/~sewardj/

It's a memory-management debugger -- it tracks all memory allocations,
reads and writes to find accesses to unitialised or invalid memory and
memory leaks.

Valgrind works only under Linux and of course imposes a huge speed
penalty, but it seems to work quite well.

	-thomas


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._