I came across this in a simulation I ran under 1.6.0: If I do something like R> x <- rnorm(10) R> rval <- NULL R> for(i in 1:100000) rval <- t.test(x)$p.value then the process size remains at about 14M under 1.5.1, but it seems to be almost linearly growing up to more than 100M under 1.6.0. I know that the above simulation is nonsense, but it was the simplest I could come up with to reproduce the behaviour. It doesn't depend on t.test, if I use wilcox.test(x)$p.value the same happens... I could reproduce this behaviour under Linux and Solaris, the exact versions are given below. Z --- the problem exists on platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 1 minor 6.0 year 2002 month 10 day 01 language R and platform sparc-sun-solaris2.7 arch sparc os solaris2.7 system sparc, solaris2.7 status major 1 minor 6.0 year 2002 month 10 day 01 language R but not on platform i386-pc-linux-gnu arch i386 os linux-gnu system i386, linux-gnu status major 1 minor 5.1 year 2002 month 06 day 17 language R -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
growing process size in simulation
10 messages · Achim Zeileis, Peter Dalgaard, Luke Tierney +3 more
Achim Zeileis <zeileis@ci.tuwien.ac.at> writes:
I came across this in a simulation I ran under 1.6.0: If I do something like R> x <- rnorm(10) R> rval <- NULL R> for(i in 1:100000) rval <- t.test(x)$p.value then the process size remains at about 14M under 1.5.1, but it seems to be almost linearly growing up to more than 100M under 1.6.0. I know that the above simulation is nonsense, but it was the simplest I could come up with to reproduce the behaviour. It doesn't depend on t.test, if I use wilcox.test(x)$p.value the same happens...
Argh. Confirmed. One interesting clue is that R itself doesn't seem to
know about this:
for(i in 1:50000) {
rval <- t.test(x)$p.value
if (i %% 10000 == 0) print(gc())
}
used (Mb) gc trigger (Mb)
Ncells 208343 5.6 407500 10.9
Vcells 64656 0.5 786432 6.0
used (Mb) gc trigger (Mb)
Ncells 208343 5.6 407500 10.9
Vcells 64656 0.5 786432 6.0
used (Mb) gc trigger (Mb)
Ncells 208343 5.6 407500 10.9
Vcells 64656 0.5 786432 6.0
used (Mb) gc trigger (Mb)
Ncells 208343 5.6 407500 10.9
Vcells 64656 0.5 786432 6.0
used (Mb) gc trigger (Mb)
Ncells 208343 5.6 407500 10.9
Vcells 64656 0.5 786432 6.0
..but the memory footprint is still growing.
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On 11 Oct 2002, Peter Dalgaard BSA wrote:
Achim Zeileis <zeileis@ci.tuwien.ac.at> writes:
I came across this in a simulation I ran under 1.6.0: If I do something like R> x <- rnorm(10) R> rval <- NULL R> for(i in 1:100000) rval <- t.test(x)$p.value then the process size remains at about 14M under 1.5.1, but it seems to be almost linearly growing up to more than 100M under 1.6.0. I know that the above simulation is nonsense, but it was the simplest I could come up with to reproduce the behaviour. It doesn't depend on t.test, if I use wilcox.test(x)$p.value the same happens...
Argh. Confirmed. One interesting clue is that R itself doesn't seem to
know about this:
for(i in 1:50000) {
rval <- t.test(x)$p.value
if (i %% 10000 == 0) print(gc())
}
used (Mb) gc trigger (Mb)
Ncells 208343 5.6 407500 10.9
Vcells 64656 0.5 786432 6.0
used (Mb) gc trigger (Mb)
Ncells 208343 5.6 407500 10.9
Vcells 64656 0.5 786432 6.0
used (Mb) gc trigger (Mb)
Ncells 208343 5.6 407500 10.9
Vcells 64656 0.5 786432 6.0
used (Mb) gc trigger (Mb)
Ncells 208343 5.6 407500 10.9
Vcells 64656 0.5 786432 6.0
used (Mb) gc trigger (Mb)
Ncells 208343 5.6 407500 10.9
Vcells 64656 0.5 786432 6.0
...but the memory footprint is still growing.
It looks to me like something in deparse (which gets called in
t.test.default) may be the culprit:
for(i in 1:100000) rval <- deparse("x")
exhibits the same behavior.
In running under gdb it looks like deparse("x") results in the call
sequence
deparse1WithCutoff->deparse2->deparse2buff->print2buff->R_AllocStringBuffer
and malloc is called in R_AllocStringBuffer to create the buffer. The
allocation is stored into a local structure variable in
deparse1WithCutoff and I think is not being free'd before
deparse1WithCutoff exits.
luke
Luke Tierney University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke@stat.uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Luke Tierney <luke@stat.uiowa.edu> writes:
It looks to me like something in deparse (which gets called in
t.test.default) may be the culprit:
for(i in 1:100000) rval <- deparse("x")
exhibits the same behavior.
In running under gdb it looks like deparse("x") results in the call
sequence
deparse1WithCutoff->deparse2->deparse2buff->print2buff->R_AllocStringBuffer
and malloc is called in R_AllocStringBuffer to create the buffer. The
allocation is stored into a local structure variable in
deparse1WithCutoff and I think is not being free'd before
deparse1WithCutoff exits.
Thanks for looking into this, Luke. Yes, I agree. Specifically, the issue seems to be that we used to work off a static variable "buff" and use realloc on that inside AllocBuffer if it was non-null. Now we've put the variable inside a structure "buf->data" and still trying to use the realloc'ing construct, but since "buf" (aka localData) is a local variable, it gets NULL'ed every time deparse1WithCutoff() is called ---> deparse1WithCutoff needs to clean up on the way out.
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On 12 Oct 2002, Peter Dalgaard BSA wrote:
Luke Tierney <luke@stat.uiowa.edu> writes:
It looks to me like something in deparse (which gets called in
t.test.default) may be the culprit:
for(i in 1:100000) rval <- deparse("x")
exhibits the same behavior.
In running under gdb it looks like deparse("x") results in the call
sequence
deparse1WithCutoff->deparse2->deparse2buff->print2buff->R_AllocStringBuffer
and malloc is called in R_AllocStringBuffer to create the buffer. The
allocation is stored into a local structure variable in
deparse1WithCutoff and I think is not being free'd before
deparse1WithCutoff exits.
Thanks for looking into this, Luke. Yes, I agree. Specifically, the issue seems to be that we used to work off a static variable "buff" and use realloc on that inside AllocBuffer if it was non-null. Now we've put the variable inside a structure "buf->data" and still trying to use the realloc'ing construct, but since "buf" (aka localData) is a local variable, it gets NULL'ed every time deparse1WithCutoff() is called ---> deparse1WithCutoff needs to clean up on the way out.
Right. Unfortunately the issue goes beyond deparse.c. R_AllocStringBuffer is also called in printutils.c, saveload.c, scan.c. I'm not sure about printutils.c, but the others look like they have the same issue. R_LoadFromFile in saveload.c looks like a particular pain to fix; perhaps moving the buffer allocation up one call level would work. I can try to fix this later next week unless someone else gets there first. I wonder if there is something we could add to the QA tools that could have picked this up. luke
Luke Tierney University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke@stat.uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Luke Tierney <luke@stat.uiowa.edu> writes:
Thanks for looking into this, Luke. Yes, I agree. Specifically, the issue seems to be that we used to work off a static variable "buff" and use realloc on that inside AllocBuffer if it was non-null. Now we've put the variable inside a structure "buf->data" and still trying to use the realloc'ing construct, but since "buf" (aka localData) is a local variable, it gets NULL'ed every time deparse1WithCutoff() is called ---> deparse1WithCutoff needs to clean up on the way out.
Right. Unfortunately the issue goes beyond deparse.c. R_AllocStringBuffer is also called in printutils.c, saveload.c, scan.c. I'm not sure about printutils.c, but the others look like they have the same issue. R_LoadFromFile in saveload.c looks like a particular pain to fix; perhaps moving the buffer allocation up one call level would work. I can try to fix this later next week unless someone else gets there first. I wonder if there is something we could add to the QA tools that could have picked this up.
I've fixed the one in deparse.c now. I had a suspicion that there might be other cases, so I tried to abstract it in the same spirit as R_AllocStringBuffer. There's now an R_FreeStringBuffer, and I think a straightforward rule for invoking it: The function that allocates the pointer for the storage must call R_FreeStringBuffer before returning. R_LoadFromFile has more than a dozen exit points, so you get to say "R_FreeStringBuffer(data.buffer);" quite a few times, but I don't think the pain extends beyond that.
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
1 day later
On Sat, 12 Oct 2002, Luke Tierney wrote:
I wonder if there is something we could add to the QA tools that could have picked this up.
valgrind will do this (under Linux only). I just tried it and got ==6195== 151040 bytes in 295 blocks are definitely lost in loss record 49 of 50 ==6195== at 0x400434EB: malloc (vg_clientfuncs.c:100) ==6195== by 0x8091BB0: R_AllocStringBuffer (deparse.c:150) ==6195== by 0x8093501: print2buff (deparse.c:940) ==6195== by 0x8092760: deparse2buff (deparse.c:560) but ==6195== 12341268 bytes in 6183 blocks are still reachable in loss record 50 of 50 ==6195== at 0x400434EB: malloc (vg_clientfuncs.c:100) ==6195== by 0x80D0131: GetNewPage (memory.c:552) ==6195== by 0x80D3188: Rf_allocVector (memory.c:1731) ==6195== by 0x80D2ECB: Rf_allocString (memory.c:1622) ==6195== showing that it can distinguish memory leaks from memory that just hasn't been freed. -thomas -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Reference to valgrind, please?
On Mon, 14 Oct 2002, Thomas Lumley wrote:
On Sat, 12 Oct 2002, Luke Tierney wrote:
I wonder if there is something we could add to the QA tools that could have picked this up.
valgrind will do this (under Linux only). I just tried it and got ==6195== 151040 bytes in 295 blocks are definitely lost in loss record 49 of 50 ==6195== at 0x400434EB: malloc (vg_clientfuncs.c:100) ==6195== by 0x8091BB0: R_AllocStringBuffer (deparse.c:150) ==6195== by 0x8093501: print2buff (deparse.c:940) ==6195== by 0x8092760: deparse2buff (deparse.c:560) but ==6195== 12341268 bytes in 6183 blocks are still reachable in loss record 50 of 50 ==6195== at 0x400434EB: malloc (vg_clientfuncs.c:100) ==6195== by 0x80D0131: GetNewPage (memory.c:552) ==6195== by 0x80D3188: Rf_allocVector (memory.c:1731) ==6195== by 0x80D2ECB: Rf_allocString (memory.c:1622) ==6195== showing that it can distinguish memory leaks from memory that just hasn't been freed. -thomas -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Reference to valgrind, please?
http://developer.kde.org/~sewardj/ It is an x86/Linux only x86 to x86 emulator which can do memory checking, cache profiling, etc. David Bauer -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Mon, 14 Oct 2002 ripley@stats.ox.ac.uk wrote:
Reference to valgrind, please?
http://developer.kde.org/~sewardj/ It's a memory-management debugger -- it tracks all memory allocations, reads and writes to find accesses to unitialised or invalid memory and memory leaks. Valgrind works only under Linux and of course imposes a huge speed penalty, but it seems to work quite well. -thomas -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._