Skip to content

help debugging segfaults

3 messages · Liaw, Andy, Brian Ripley, Luke Tierney

#
(Sorry for the cross-post--- I wasn't sure which list is more
appropriate...)

Hi everyone,

I've run into segfaults when using my randomForest package on large dataset
(e.g., 100 x 15200) and large number of trees (e.g., ntree=7000 and
mtry=3000).  I'm wondering if anyone can give me some hints on where to look
for the problem.

The randomForest package mainly consists of two things: rf.c contains rf(),
a C wrapper function that calls the Fortran subroutines in rfsub.f that do
most of the work (slightly altered from Breiman's original code).  All
memory allocations are done in rf.c, using S_alloc().  When I run random
forest with the data and setting as mentioned above, it was able to finish
growing the 7000 trees, but segfault when returning from rf() to R.  GDB
gave the following (gdb prompts removed):

do_dotCode (call=0x873aff4, op=0x8a5f620, args=0x8a5d010, env=0x86fd0a4)
    at dotcode.c:1413
1413            break;
1845        PROTECT(ans = allocVector(VECSXP, nargs));
1846        havenames = 0;
1847        if (dup) {
1849            info.cargs = cargs;
1850            info.allArgs = args;
1851            info.nargs = nargs;
1852            info.functionName = buf;
1853            nargs = 0;
1854            for (pargs = args ; pargs != R_NilValue ; pargs =
CDR(pargs)) {
1855                if(argConverters[nargs]) {
1864                    PROTECT(s = CPtrToRObj(cargs[nargs], CAR(pargs),
which));

Program received signal SIGSEGV, Segmentation fault.
0x080ddc6a in RunGenCollect (size_needed=1515400) at memory.c:1133
1133                    SEXP next = NEXT_NODE(s);

This is obtained on Linux (Mandrake 8.2 w/enterprise kernel 2.4.8) running
on dual P3-866 Xeon with 2GB RAM, using R-1.5.0 compiled from source.

Any help/hints/comments are greatly appreciated!

Regards,
Andy

Andy I. Liaw, PhD
Biometrics Research          Phone: (732) 594-0820
Merck & Co., Inc.              Fax: (732) 594-1565
P.O. Box 2000, RY70-38            Rahway, NJ 07065
mailto:andy_liaw@merck.com



------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.

==============================================================================

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
(Confined to R-devel).

This almost always means that R's memory system (or malloc's) has been
corrupted by array overruns.

Sometimes gctorture(TRUE) helps.  However in your case it's more likely
those S_alloc calls, so try (temporarily) replacing them by calls to Calloc
and then use something like Purify or `Electric Fence'. to test for
overruns.
On Wed, 12 Jun 2002, Liaw, Andy wrote:

            
Only a few people read R-devel and not R-help.
This is just saying it can't allocate the copies for the returned
values of the .C arguments.  I think you might want to consider .Call
given that you are probably using quite large structures.

  
    
#
These symptoms suggest that your code may be writing outside of the
data it allocates, which would trash internal data structures of the R
heap and result in a segfault at a GC.  I would try to find a malloc
debugging library, use malloc in place of S_alloc, and see if the
malloc debugging tools show any malloc heap corruption. The standard
malloc in Mac OS X has very good debugging support if you have access
to that.

luke
On Wed, Jun 12, 2002 at 09:26:07AM -0400, Liaw, Andy wrote: