Skip to content

Help to create bugzilla account

12 messages · Martin Maechler, Iñaki Ucar, Dmitriy Selivanov +4 more

#
> Hi mailing list and R-core. Could someone from R-core please help me to
    > create account in bugzilla? I would like to submit issue related to gc() to
    > wishlist.

I will create one.

Your previous e-mails left me pretty clueless about what the
problem is that you want to solve ... but maybe others
understand better what you mean.

Note that in the case of such a relatively sophisticated wish
without a clear sign of a problem (in my view)
chances are not high that anything will change, unless someone
provides a (small footprint) patch towards the (R-devel aka
"trunk") sources *and* reproducible R code that depicts the
problem.

Still: Thank you for trying to make R better by contributing with
careful bug reports !

Best,
Martin


    > Related context is here -
    > https://stat.ethz.ch/pipermail/r-devel/2017-July/074715.html


    > -- 
    > Regards
    > Dmitriy Selivanov

    > [[alternative HTML version deleted]]

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel
#
Thanks Martin, I've received invitation and will create ticket soon.

Regarding issue - basically the problem is that on operating systems which
use glibc memory is not freed (R releases it, but system doesn't trim it).
Setting corresponding environment variable (MALLOC_TRIM_THRESHOLD_) doesn't
help, but manual call to `*malloc_trim*` does.
Implication is following - for long-running jobs linux OOM killer can kill
R process because highly-fragmented pieces of memory never returned back to
OS. Very simple example:

   1. I have 2gb of RAM on the machine.
   2. I create large list of small objects which occupies 1.5 gig.
   3. I remove it (even can manually call gc()) - "top" still shows 1.5 gig
   occupied. If I decide to recreate similar 1.5gig list - ram will be reused.
   4. However if after removal of the list I will decide to create normal
   continuous  integer/double vector of size 500+ mb - R will report that it
   can't allocate vector of such size.

Hope this helped.

2017-08-11 18:00 GMT+04:00 Martin Maechler <maechler at stat.math.ethz.ch>:

  
    
#
2017-08-11 16:00 GMT+02:00 Martin Maechler <maechler at stat.math.ethz.ch>:
How to reproduce it:

a <- replicate(2e6, new.env()) # ~ 1.4 GB of memory
rm(a)
gc() # the R process still has the memory assigned

I?aki
#
Right, but that's unavoidable because of the way Linux allocates memory - see FAQ 7.42
The memory is free, Linux just keeps it for future allocations.

Running malloc.trim doesn't help, because the issue is fragmentation due to the linear design of the pool - you likely will have another object on top so in most practical cases malloc.trim() doesn't actually do anything. You can always call malloc.trim() yourself is you think it helps, but it doesn't in the general case. The only way to address that would be to move allocated objects from top of the pool down, but that's not something R can allow, because it cannot know which code still has SEXP pointers referring to that object.

Cheers,
Simon
#
Strange because in my all my experiments calling malloc.trim always helped
- memory reported by top decreased to the level it supposed to be. Do you
have in mind case when calling malloc.trim won't do anything? Also
shouldn't MALLOC_TRIM_THRESHOLD_ env variable has impact on malloc.trim
calls? At the moment seems any value is ignored...

12 ???. 2017 ?. 6:09 ?? ???????????? "Simon Urbanek" <
simon.urbanek at r-project.org> ???????:

  
  
#
On Sat, 12 Aug 2017, Dmitriy Selivanov wrote:

            
This is a question for glibc developers. For that matter, this entire
thread is really about tuning of the malloc in glibc and should
ideally be addressed upstream.

There has been some discussion of this in other contexts, e.g. Python at

https://nuald.blogspot.com/2013/06/memory-reclaiming-in-python.html

and emacs at

http://notes.secretsauce.net/notes/2016/04/08_glibc-malloc-inefficiency.html

As the Python posts poitns out, it is possible to use alternate malloc
implementations, either rebuilding R to use them or using LD_PRELOAD.
On Ubuntu for example, you can have R use jemalloc with

sudo apt-get install libjemalloc1
env LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 R

This does not seem to hold onto memory to the same degree, but I don't
know about any other aspect of its performance.

The emacs post suggests that calling malloc_trim may have more of an
effect in some cases: the post describes callign it via gdb on all
running processes and sees a substantial memory footprint drop for
emacs, Xorg and opera. I tried the same experiment on my system with
emacs, Xorg and firefox and didn't see what that post saw -- maybe 1
few percent recovery.

At this point I'm not sure we have enough data to show that adding
malloc_trim calls at the end of each GC, say, would warrant the
nusance of having to add configure checks. It would also be necessary
to make sure that adding this doesn't significantly ompact malloc
performance.

I don't know if this issue exists on Windows as well; it might as the
basic malloc we use there is the same as used in glibc (Dog Lea's
malloc).

Best,

luke

  
    
#
On 12 August 2017 at 15:10, luke-tierney at uiowa.edu wrote:
| As the Python posts poitns out, it is possible to use alternate malloc
| implementations, either rebuilding R to use them or using LD_PRELOAD.
| On Ubuntu for example, you can have R use jemalloc with
| 
| sudo apt-get install libjemalloc1
| env LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 R
| 
| This does not seem to hold onto memory to the same degree, but I don't
| know about any other aspect of its performance.

Interesting.

I don't really know anything about malloc versus jemalloc internals but I can
affirm that redis -- an in-memory database written in single-threaded C for
high performance -- in its Debian builds has been using jemalloc for years,
presumably by choice of the maintainer. (We are very happy users of [a gently
patched] redis at work; lots of writes; very good uptime.)

Having the ability to switch to jemalloc, we could design a test bench and
compare what the impact is.

Similarly, if someone cared, I could (presumably) alter the default R build
for Debian and Ubunto to also switch to jemalloc.

Anybody feel like doing some empirics?

Dirk
#
Very interesting information about switching glibc malloc to jemalloc.

So I see action plan as following:

   1. set up some benchmark (need to think about design)
   2. Run it on ubuntu machine with default glibc malloc
   3. Run it with malloc_trim passed with reg.finalizer()
   4. Run it with jemalloc
   5. Review results and if they will look better than with glibc malloc -
   possibly consider switch R builds to use jemalloc on Debian, Ubuntu

Can't promise about timeline, but I will definitely try to investigate.

2017-08-13 1:36 GMT+04:00 Dirk Eddelbuettel <edd at debian.org>:

  
    
#
On 13 August 2017 at 15:15, Dmitriy Selivanov wrote:
| Very interesting information about switching glibc malloc to jemalloc.
| 
| So I see action plan as following:
| 
|    1. set up some benchmark (need to think about design)
|    2. Run it on ubuntu machine with default glibc malloc
|    3. Run it with malloc_trim passed with reg.finalizer()
|    4. Run it with jemalloc
|    5. Review results and if they will look better than with glibc malloc -
|    possibly consider switch R builds to use jemalloc on Debian, Ubuntu
| 
| Can't promise about timeline, but I will definitely try to investigate.

Thumbs up!

If you set up a (public) git repo I will try to help. I have access to boxen
running these OSs ranging from 2gb ram (old netbook) to 100+gb ram (at work).

Dirk
#
FWIW if we are talking about alternative allocators, tcmalloc is another candidate that we are using for our projects where we care about allocations and performance (another upshot is that it's very flexible so you can do a lot of cool things if you care). However I didn't try it with R - I'll have a look if it can address the issue we're talking about.
#
On Saturday, August 12, 2017 5:36:36 PM EDT Dirk Eddelbuettel wrote:
Depending on how this turns out, Fedora, RHEL, Centos also have jemalloc and 
tcmalloc. Meaning, if its good on those two, its good on Linux in general. 
Basically, jemalloc is faster for many work loads but its harder to spot 
problems. Glibc is better at spotting memory bugs but not as fast.

-Steve