memory leak in sub("[range]",...)
As a belated follow-up (I was away at the time), note that in general we don't tamper with code we have ported from other projects as it makes future maintenance so much more difficult. At the very least, we need conspicuous comments to ensure that such changes do not get lost (I've just added one).
On Tue, 15 Jul 2008, Martin Maechler wrote:
"BD" == Bill Dunlap <bill at insightful.com>
on Wed, 9 Jul 2008 11:26:50 -0700 (PDT) writes:
BD> There is a 2-block memory leak in the sub() (or any other regex-related BD> function, probably) when the pattern argument involves a range BD> expression, e.g., '[0-9]'. BD> % R --debugger=valgrind --debugger-args=--leak-check=full --vanilla BD> ==14519== Memcheck, a memory error detector. BD> ==14519== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al. BD> ==14519== Using LibVEX rev 1658, a library for dynamic binary translation. BD> ==14519== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP. BD> ==14519== Using valgrind-3.2.1, a dynamic binary instrumentation framework. BD> ==14519== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al. BD> ==14519== For more details, rerun with: -v BD> ==14519== BD> R version 2.8.0 Under development (unstable) (2008-07-07 r46046) BD> ...
>> for(i in 1:1000)sub("[a-c]","+","0abcd")
>> q()
BD> ==32503== BD> ==32503== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 40 from 2) BD> ==32503== malloc/free: in use at exit: 12,603,409 bytes in 7,915 blocks. BD> ==32503== malloc/free: 61,973 allocs, 54,058 frees, 54,494,371 bytes BD> allocated. BD> ==32503== For counts of detected errors, rerun with: -v BD> ==32503== searching for pointers to 7,915 not-freed blocks. BD> ==32503== checked 12,616,568 bytes. BD> ==32503== BD> ==32503== 4 bytes in 1 blocks are possibly lost in loss record 1 of 45 BD> ==32503== at 0x40046EE: malloc (vg_replace_malloc.c:149) BD> ==32503== by 0x4005B9A: realloc (vg_replace_malloc.c:306) BD> ==32503== by 0x80A5F92: parse_expression (regex.c:5202) BD> ==32503== by 0x80A614F: parse_branch (regex.c:4707) BD> ==32503== by 0x80A621A: parse_reg_exp (regex.c:4666) BD> ==32503== by 0x80A6618: Rf_regcomp (regex.c:4635) BD> ==32503== by 0x8110CB4: do_gsub (character.c:1355) BD> ==32503== by 0x80654A4: do_internal (names.c:1135) BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461) BD> ==32503== by 0x8160DA7: do_begin (eval.c:1174) BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461) BD> ==32503== by 0x8162210: Rf_applyClosure (eval.c:667) BD> ==32503== BD> ... ignore 85 byte/4 block leak in readline ... BD> ==32503== 7,980 bytes in 1,995 blocks are definitely lost in loss record 36 of BD> 45 BD> ==32503== at 0x40046EE: malloc (vg_replace_malloc.c:149) BD> ==32503== by 0x4005B9A: realloc (vg_replace_malloc.c:306) BD> ==32503== by 0x80A5F92: parse_expression (regex.c:5202) BD> ==32503== by 0x80A614F: parse_branch (regex.c:4707) BD> ==32503== by 0x80A621A: parse_reg_exp (regex.c:4666) BD> ==32503== by 0x80A6618: Rf_regcomp (regex.c:4635) BD> ==32503== by 0x8110CB4: do_gsub (character.c:1355) BD> ==32503== by 0x80654A4: do_internal (names.c:1135) BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461) BD> ==32503== by 0x8160DA7: do_begin (eval.c:1174) BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461) BD> ==32503== by 0x8162210: Rf_applyClosure (eval.c:667) BD> The leaked blocks are allocated in iinternal_function build_range_exp() at BD> 5200 /* Use realloc since mbcset->range_starts and mbcset-> range_ends BD> 5201 are NULL if *range_alloc == 0. */ BD> 5202 new_array_start = re_realloc (mbcset->range_starts, BD> wchar_t, BD> 5203 new_nranges); BD> 5204 new_array_end = re_realloc (mbcset->range_ends, wchar_t, BD> 5205 new_nranges); BD> ... BD> 5210 mbcset->range_starts = new_array_start; BD> 5211 mbcset->range_ends = new_array_end; BD> This file, src/main/regex.c, contains a complicated mess of #ifdef's ((note that these were not BD> but range_starts and range_ends are defined and appear to be used BD> whether or not _LIBC is defined. However, they are only freed if _LIBC BD> is defined. In my setup (Linux, gcc 3.4.5) _LIBC is not defined so BD> they don't get freed. Ok; this all makes sense; I've seen the same in the source Interestingly, my newer setup (Linux, gcc 4.2.x ...) does not show the memory leak; I've not checked if it's because _LIBC is defined or for another reason. I'm applying your patch --- thank you, Bill. Martin BD> After the following change in free_charset() only the 85 byte/4 block BD> leak in readline remains. BD> Index: regex.c BD> =================================================================== BD> --- regex.c (revision 46046) BD> +++ regex.c (working copy) BD> @@ -6240,9 +6240,9 @@ BD> # ifdef _LIBC BD> re_free (cset->coll_syms); BD> re_free (cset->equiv_classes); BD> +# endif BD> re_free (cset->range_starts); BD> re_free (cset->range_ends); BD> -# endif BD> re_free (cset->char_classes); BD> re_free (cset); BD> } BD> [This report may be a duplicate: I tried submitting it via the form in BD> http://bugs.r-project.org/cgi-bin/R, but I cannot find it there now.] neither do I. The machine running the repository had a (announce by Peter Dalgaard) downtime a couple of days ago, so this may be related. BD> ---------------------------------------------------------------------------- BD> Bill Dunlap BD> Insightful Corporation BD> bill at insightful dot com
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595