Skip to content
Prev 52594 / 63421 Next

Memory leak with tons of closed connections

On Fri, Nov 11, 2016 at 12:08 PM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
Thank you very much, this was very useful!

I tried to do some more research on this, as Gabor Csardi also
suspected that the memory grow might be due to the writer being faster
than the reader, so data is simply accumulating in the input buffer of
the reader. I double checked this via:

    Rscript --vanilla -e
"i<-1;while(TRUE){cat(runif(1),'\n');i<-i+1;if(i==1e6){Sys.sleep(15);i<-1}}"
| Rscript --vanilla -e
"cat(Sys.getpid(),'\n');i<-0;while(TRUE){con<-file('stdin',open='r',blocking=TRUE);line<-scan(con,what=character(0),nlines=1,quiet=TRUE);close(con);rm(con);a<-gc();i<-i+1;if(i%%1e3==1){cat('i=',i,'\\n');print(a)}}"scan(con,what=character(0),nlines=1,quiet=TRUE);close(con);rm(con);gc()}"

So the writer generates a good number of lines, but sleeps for 15
seconds after a while so that the reader can catch up. Monitoring the
memory footprint of the process (by the way gc reported no memory
increase in the reader, just like in Martin's output) shows that the
memory grows when the writer sends data, and it's constant when the
writer is sleeping, but it never decreases: http://imgur.com/r7T02pK

Maybe it's more like an OS-specific question based on this, you are
absolutely right, but I was not able to reproduce the same memory
issue in plain bash via:

    while :;do echo '1';done | bash -c "while :;do read;done"

But I'm not sure if this does exactly the same as the original R
script, so this is rather just a guess.

On the other hand, I tried to modify the original minimal R script in
other ways as well to see which part might result in the strange
memory growth, and it seems that opening the connection once but
keeping the rest of the script (so still generating and reading tons
of lines without any sleep), did not show any memory leak:

    Rscript --vanilla -e "while(TRUE)cat(runif(1),'\n')" | Rscript
--vanilla -e "cat(Sys.getpid(),'\n');con<-file('stdin',open='r',blocking=TRUE);while(TRUE){line<-scan(con,what=character(0),nlines=1,quiet=TRUE);};close(con)"

Based on this, I think I can (should) modify my R application to open
stdin only once and read from that connection in the infinite loop,
but I'm still interested in understanding what's causing the extra
memory usage when opening and closing many connections (if my above
findings are correct).

Thank you very much again, and I'm still looking for any suggestion or
advice on how to debug this further.