Just found out [R 0.63, standard -v -n] :
> rm(list=ls())
> gc()
free total
Ncells 96538 200000
Vcells 214008 250000
> hist(runif(50000))
Error: heap memory (1953 Kb) exhausted [needed 390 Kb more]
which is a bit astonishing given that I still have room for 214000 double's
> u1 <- runif(50000)
> u2 <- runif(50000)
> gc()
free total
Ncells 96534 200000
Vcells 114006 250000
debug(hist.default) quickly revealed that the error was produced
when .C("bincount",....) was called.
Looking at the help,
help(.C)
and then at the "DUP = TRUE" default argument to .C(.),
I was reminded that every argument is first copied before being passed to
bincount().
Setting the "DUP = FALSE" argument in hist.default
made it work with the above 50000 doubles.
But then I wondered ``more generally'' :
What exactly happens / can happen when calling, e.g.,
r <- .C("foo", x=x, y=as.double(y), DUP = FALSE)
Will 'x' be altered after the call to .C(*) if in C's
foo(double *x, double *y)
x is altered?
Will 'y' be unaltered anyway, since "as.double(y)" produces a
a different object than 'y' anway?
I know that I could make experiments and find out,
but hopefully, one of you will know much better and explain to all
R-develers.
Really useful might be a comprehensive list of recommendations
on when "DUP = FALSE" is useful / advisable / detestable ...
Thank you!
Martin
Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum SOL G1; Sonneggstr.33
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1086 <><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Saving memory usage -- .C(....., DUP = FALSE) danger?
2 messages · Martin Maechler, Thomas Lumley
On Thu, 26 Nov 1998, Martin Maechler wrote:
But then I wondered ``more generally'' :
What exactly happens / can happen when calling, e.g.,
r <- .C("foo", x=x, y=as.double(y), DUP = FALSE)
Will 'x' be altered after the call to .C(*) if in C's
foo(double *x, double *y)
x is altered?
Will 'y' be unaltered anyway, since "as.double(y)" produces a
a different object than 'y' anway?
x will be altered, y will not. If you want y altered then you have to assign it to storage model "double" earlier.
Really useful might be a comprehensive list of recommendations on when "DUP = FALSE" is useful / advisable / detestable ...
Here's a start.
DUP=FALSE is dangerous.
There are two important dangers with DUP=FALSE. The first
is that garbage collection may move the object, resulting in the pointers
pointing nowhere useful and causing hard-to-reproduce bugs.
The second is that if you pass a formal parameter of the calling function
to .C/.Fortran with DUP=FALSE I don't think it is necessarily copied. You
may be able to change not only the local variable but the variable one
level up. This will also be very hard to trace.
1) If your C/Fortran routine calls back any R function including
S_alloc/R_alloc then do not use DUP=FALSE. Don't even think about it.
Calling almost any R function could trigger garbage collection.
2) If you don't trigger garbage collection it is safe and useful to set
DUP=FALSE if you don't change any of the variables that might be affected
eg .C("Cfunction",input=x,output=numeric(10))
In this case the output variable didn't exist before the call so it can't
cause trouble. If the input variable is not changed in Cfunction you are
safe.
I've commented before (but never actually done anything) that it would be
a useful intermediate step to have analogues of the Fortran 90 INTENT IN
and INTENT OUT declarations for these functions. In the example above
there is no need to copy the input back after calling Cfunction and no
need to copy the output before calling (just to allocate the space).
Something like
.C("Cfunction",input=x,output=numeric(10),IN=c(T,F),OUT=c(F,T))
might then say to copy x and allocate uninitialised space for numeric(10),
call the function, and then copy output back again. The first component of
the result would then be NULL, saving space in the local environment as
well. These would be less efficient but less dangerous than DUP=FALSE as
you couldn't mess up R's internal structures by getting the declarations
wrong.
Thomas Lumley
Assistant Professor, Biostatistics
University of Washington, Seattle
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._