SEXP i/o, .Call(), and garbage collection. - R-devel

Thu, Feb 1, 2007 10:22 AM #

Apologies for any obtuseness in the following.  We have been working
on Version 2.0 of the randomSurvivalForest CRAN package and we're
encountering a perplexing 'memory not mapped' segfault that we believe
is "influenced" by GC.

We essentially have two R functions, rsf.default(..), and
predict.rsf(..) and two corresponding entry points, rsfGrow(...), and
rsfPredict(...), into our C library.  These entry points are
implemented via the .Call(...) interface.  Inputs to the C code are
vectors of integers and reals in the form of SEXP pointers, and the
outputs for both .Call(...)'s is a SEXP list containing vectors of
integers and reals.

rsf.default(...)  grows a forest of binary trees given survival data,
and predict.rsf(...) takes the forest output from rsf.default(...) and
uses it to predict with a new data set.

Things go fine until we put the system under stress.  We can grow
repeatedly without issues, and predict repeatedly without issues,
using a loop to stress the system.  We detect no memory leaks, and C
stack usage is stable.

However, when we grow and predict alternately within the same loop we
encounter a segfault, randomly in the R functions.  The segfault can
occur after hundreds of iterations, but when gctorture is true, the
segfault usually occurs much sooner.

In the C code, we protect all incoming SEXP objects, though we don't
believe it is necessary for function arguments.  The output objects
are of course protected, and all are balanced with unprotect
statements.

Within the C code, we manage our own memory using malloc(...) and
free(...).  We detect no memory leaks, and our experience has been
that they are relatively easy to detect under stress given the large
memory imprint our data structures typically have.  Stack usage using
Cstack_info() is stable.

For clarity, pseudo code for the trivial stress loop is as follows:

formula = as.formula(Survrsf(time,status)~.))
data(veteran, package="randomSurvivalForest")

for (i in 1:1000) {
  growObject = rsf.default(formula, veteran)
  predictObject = rsf.predict(growObject, veteran)
}

On single iterations, we have carefully examined the output of each
function for coherency.  All vectors are initialized and populated
with valid data.  We can grow repeatedly or predict repeatedly.
However, when the two functions are combined in the same loop, we
consistently segfault with 'memory not mapped' in either R function,
usually in some seemingly random and benign location.  For example:

Growing using  logrank , Iteration  253  ...

*** caught segfault ***
address 0x7dbdda88, cause 'memory not mapped'

Traceback:
1: as.vector(x[, i])
2: as.data.frame.matrix(model.matrix(as.formula(paste("~ -1 +",
paste(c(fNames[1:2], predTempNames), collapse = "+"))), data))
3: as.data.frame(model.matrix(as.formula(paste("~ -1 +",
paste(c(fNames[1:2],     predTempNames), collapse = "+"))), data))
4: rsf.default(formula = formula, data = dataSet, ntree, mtry,
nodesize,     splitrule = splitrule[j], importance = importance,
forest = forest,     do.trace = do.trace, proximity = proximity, ntime
= ntime,     seed = seed, add.noise = add.noise, predictorWt =
predictorWt)
5: rsf(formula = formula, data = dataSet, ntree, mtry, nodesize,
splitrule = splitrule[j], importance = importance, forest = forest,
 do.trace = do.trace, proximity = proximity, ntime = ntime,     seed =
seed, add.noise = add.noise, predictorWt = predictorWt)
6: eval.with.vis(expr, envir, enclos)
7: eval.with.vis(ei, envir)
8: source("stress.R")

We see that we are in the grow phase in the above segfault, that does
not depend on any output SEXP objects that may potentially be corrupt.
 However, the creation of SEXP objects (in the predict call) appears
to be a necessary condition for failure.

We are wondering if there is something fundamentally missing in our
understanding of the interaction between R and C via SEXP objects,
memory allocation, persistency, and any potential garbage collection
that may be occurring.  Any comments would be greatly appreciated.

Our environment is as follows, though we have seen the same behaviour
on an SGI Altix system, a Mac OS X (Intel) system, and with R 2.3.0:

platform       powerpc-apple-darwin8.8.0
arch           powerpc
os             darwin8.8.0
system         powerpc, darwin8.8.0
status
major          2
minor          4.1
year           2006
month          12
day            18
svn rev        40228
language       R
version.string R version 2.4.1 (2006-12-18)

ubk

ubk2101 at columbia.edu

Udaya B. Kogalur, Ph.D.
Kogalur Shear Corporation
5425 Nestleway Drive, Suite L1
Clemmons, NC 27012

Hin-Tak Leung

Thu, Feb 1, 2007 11:01 AM #

One possible reason for such problems is if you copy the pointers
for say, attributes, classes, names, rather than duplicating them.
With very few exceptions, mostly in classes, no two R objects of
the sort you normally encounter/create/play-with should share *any*
part of their data-structure. e.g. such problem can result if you
assign the row names of the input to the output (even if both have
the same row names).

However, without the actual code, can't tell.

K. B. Udaya wrote:

Apologies for any obtuseness in the following.  We have been working
on Version 2.0 of the randomSurvivalForest CRAN package and we're
encountering a perplexing 'memory not mapped' segfault that we believe
is "influenced" by GC.

We essentially have two R functions, rsf.default(..), and
predict.rsf(..) and two corresponding entry points, rsfGrow(...), and
rsfPredict(...), into our C library.  These entry points are
implemented via the .Call(...) interface.  Inputs to the C code are
vectors of integers and reals in the form of SEXP pointers, and the
outputs for both .Call(...)'s is a SEXP list containing vectors of
integers and reals.

rsf.default(...)  grows a forest of binary trees given survival data,
and predict.rsf(...) takes the forest output from rsf.default(...) and
uses it to predict with a new data set.

Things go fine until we put the system under stress.  We can grow
repeatedly without issues, and predict repeatedly without issues,
using a loop to stress the system.  We detect no memory leaks, and C
stack usage is stable.

However, when we grow and predict alternately within the same loop we
encounter a segfault, randomly in the R functions.  The segfault can
occur after hundreds of iterations, but when gctorture is true, the
segfault usually occurs much sooner.

In the C code, we protect all incoming SEXP objects, though we don't
believe it is necessary for function arguments.  The output objects
are of course protected, and all are balanced with unprotect
statements.

Within the C code, we manage our own memory using malloc(...) and
free(...).  We detect no memory leaks, and our experience has been
that they are relatively easy to detect under stress given the large
memory imprint our data structures typically have.  Stack usage using
Cstack_info() is stable.

For clarity, pseudo code for the trivial stress loop is as follows:

formula = as.formula(Survrsf(time,status)~.))
data(veteran, package="randomSurvivalForest")

for (i in 1:1000) {
  growObject = rsf.default(formula, veteran)
  predictObject = rsf.predict(growObject, veteran)
}

On single iterations, we have carefully examined the output of each
function for coherency.  All vectors are initialized and populated
with valid data.  We can grow repeatedly or predict repeatedly.
However, when the two functions are combined in the same loop, we
consistently segfault with 'memory not mapped' in either R function,
usually in some seemingly random and benign location.  For example:

Growing using  logrank , Iteration  253  ...

*** caught segfault ***
address 0x7dbdda88, cause 'memory not mapped'

Traceback:
1: as.vector(x[, i])
2: as.data.frame.matrix(model.matrix(as.formula(paste("~ -1 +",
paste(c(fNames[1:2], predTempNames), collapse = "+"))), data))
3: as.data.frame(model.matrix(as.formula(paste("~ -1 +",
paste(c(fNames[1:2],     predTempNames), collapse = "+"))), data))
4: rsf.default(formula = formula, data = dataSet, ntree, mtry,
nodesize,     splitrule = splitrule[j], importance = importance,
forest = forest,     do.trace = do.trace, proximity = proximity, ntime
= ntime,     seed = seed, add.noise = add.noise, predictorWt =
predictorWt)
5: rsf(formula = formula, data = dataSet, ntree, mtry, nodesize,
splitrule = splitrule[j], importance = importance, forest = forest,
 do.trace = do.trace, proximity = proximity, ntime = ntime,     seed =
seed, add.noise = add.noise, predictorWt = predictorWt)
6: eval.with.vis(expr, envir, enclos)
7: eval.with.vis(ei, envir)
8: source("stress.R")

We see that we are in the grow phase in the above segfault, that does
not depend on any output SEXP objects that may potentially be corrupt.
 However, the creation of SEXP objects (in the predict call) appears
to be a necessary condition for failure.

We are wondering if there is something fundamentally missing in our
understanding of the interaction between R and C via SEXP objects,
memory allocation, persistency, and any potential garbage collection
that may be occurring.  Any comments would be greatly appreciated.

Our environment is as follows, though we have seen the same behaviour
on an SGI Altix system, a Mac OS X (Intel) system, and with R 2.3.0:

platform       powerpc-apple-darwin8.8.0
arch           powerpc
os             darwin8.8.0
system         powerpc, darwin8.8.0
status
major          2
minor          4.1
year           2006
month          12
day            18
svn rev        40228
language       R
version.string R version 2.4.1 (2006-12-18)

Vladimir Dergachev

Thu, Feb 1, 2007 11:15 AM #

On Thursday 01 February 2007 2:01 pm, Hin-Tak Leung wrote:

Hmm.. I thought that using setAttrib() would automatically increase the 
reference count, right ?

In particular, I quite often use "pseudo-factor" string vectors - where the 
string objects are passed through cache and reused when forming a string 
vector. The result is true character() type but with considerable memory 
savings. The downside is that R reference count field is usually saturated.

                                best

                                    Vladimir Dergachev

Jeffrey Horner

Fri, Feb 2, 2007 7:00 AM #

K. B. Udaya wrote:

[...]

If you can run your code on linux (x86, amd64, ppc32, or ppc64), then 
consider using valgrind for catching memory access problems. You would 
need to recompile R with debugging support (-g) and it would be best to 
compile without optimizations (although -O1 seems to be tolerated).

And running R within valgrind is as simple as:

R -d valgrind --vanilla < script.R

or even interactively with:

R -d valgrind

Best,

Jeff

http://biostat.mc.vanderbilt.edu/JeffreyHorner

Udaya B. Kogalur

Mon, Feb 5, 2007 11:27 AM #

Thank you for the prompt and helpful replies.  We're embarrassed to
report that the problem was far more mundane:  a lonely uninitialized
integer pointer.

On the bright side, we now have a far deeper understanding of SEXP i/o.

Rest assured, the C-coding team (that would be me) has be duly spanked.

ubk