Skip to content

Saving memory usage -- .C(....., DUP = FALSE) danger?

2 messages · Martin Maechler, Thomas Lumley

#
Just found out [R 0.63, standard -v -n] :

    > rm(list=ls())
    > gc()
	     free  total
    Ncells  96538 200000
    Vcells 214008 250000
    > hist(runif(50000))
    Error: heap memory (1953 Kb) exhausted [needed 390 Kb more]

which is a bit astonishing given that I still have room for 214000 double's

    > u1 <- runif(50000)
    > u2 <- runif(50000)
    > gc()
	     free  total
    Ncells  96534 200000
    Vcells 114006 250000

debug(hist.default)  quickly revealed that the error was produced
when  .C("bincount",....) was called.

Looking at the help, 
	help(.C)
and then at the "DUP = TRUE" default argument to .C(.),
I was reminded that every argument is first copied before being passed to
bincount().

Setting the "DUP = FALSE" argument in hist.default
made it work with the above 50000 doubles.

But then I wondered ``more generally'' :

	What exactly happens / can happen when calling, e.g.,

		r <- .C("foo", x=x, y=as.double(y),  DUP = FALSE)

	Will 'x' be altered after the call to .C(*)  if in C's 
		foo(double *x, double *y)
	x is altered?
	Will 'y' be unaltered anyway, since   "as.double(y)" produces a
	a different object than 'y' anway?

I know that I could make experiments and find out,
but hopefully, one of you will know much better and explain to all
R-develers.

Really useful might be a comprehensive list of recommendations 
on when  "DUP = FALSE" is useful / advisable / detestable ...

Thank you!

Martin

Martin Maechler <maechler@stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum SOL G1;	Sonneggstr.33
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1086			<><

	
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Thu, 26 Nov 1998, Martin Maechler wrote:
x will be altered, y will not.  If you want y altered then you have to
assign it to storage model "double" earlier.
Here's a start.  

DUP=FALSE is dangerous.

There are two important dangers with DUP=FALSE. The first
is that garbage collection may move the object, resulting in the pointers
pointing nowhere useful and causing hard-to-reproduce bugs.

The second is that if you pass a formal parameter of the calling function
to .C/.Fortran with DUP=FALSE I don't think it is necessarily copied. You
may be able to change not only the local variable but the variable one
level up. This will also be very hard to trace.

1) If your C/Fortran routine calls back any R function including
S_alloc/R_alloc then do not use DUP=FALSE. Don't even think about it.  
Calling almost any R function could trigger garbage collection.

2) If you don't trigger garbage collection it is safe and useful to set
DUP=FALSE if you don't change any of the variables that might be affected
	eg  .C("Cfunction",input=x,output=numeric(10))
In this case the output variable didn't exist before the call so it can't
cause trouble. If the input variable is not changed in Cfunction you are
safe.



I've commented before (but never actually done anything) that it would be
a useful intermediate step to have analogues of the Fortran 90 INTENT IN
and INTENT OUT declarations for these functions. In the example above
there is no need to copy the input back after calling Cfunction and no
need to copy the output before calling (just to allocate the space).
Something like
	.C("Cfunction",input=x,output=numeric(10),IN=c(T,F),OUT=c(F,T))
might then say to copy x and allocate uninitialised space for numeric(10),
call the function, and then copy output back again. The first component of
the result would then be NULL, saving space in the local environment as
well. These would be less efficient but less dangerous than DUP=FALSE as
you couldn't mess up R's internal structures by getting the declarations
wrong.



Thomas Lumley
Assistant Professor, Biostatistics
University of Washington, Seattle




-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._