Skip to content

Memory problems, HDF5 library and R-1.2.2 garbage collection

3 messages · Norberto Eiji Nawa, Marcus G. Daniels, Trent Piepho

#
Hello:

I've recently started using R to process data in HDF5 format. My files
come in 1.5MB chunks, but they can be as big as 50MB.

The problem I am facing with R-1.2.2 is that when I try to load 50 of
the 1.5MB HDF5 files (using the hdf5 library) in a loop, my Linux box
gets close to its memory limit around the file #15 (256MB RAM and
256MB swap). This happens even if I load file -> erase all the objects
-> load file -> erase all the objects...

When I try to load a single 50MB HDF5 file, the computer chokes before
completing the job as well.

I happen to know the author of the hdf5 library and I am very sure he
knows what he says when he tells me that the HDF5 module for R only
does very simple allocMatrix and allocVector, so garbage collection
should work on that.

So my questions are:

1) [newbie level 1000] The '"generational" garbage collector, which
   will increase the memory available to R as needed.' referred in
   (*) does also the job of freeing unused memory, I suppose. So,
   loading-HDF5-file -> erasing-all-the-objects should keep the size of R
   in memory more or less constant? Or at least, avoid that R eats up
   the whole memory to the point of hanging up my computer? 

2) How could I test the garbage collection feature in my machine,
   supposing it releases unused memory, so to determine if the problem
   is specific to my platform (linux 2.2.14-1vl6) or due to R code?

3) Anyone else using HDF5 with R out there?

(*) http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html#Why%20does%20R%20run%20out%20of%20memory%3f

Thanks a lot!

Eiji
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
NEN> When I try to load a single 50MB HDF5 file, the computer chokes
NEN> before completing the job as well.

I'll check this out and make sure there isn't gratuitous waste happening. 
Problems with the big file sound plausible, but the smaller chunks should
be doable.  Thanks for the test cases, btw..

http://www.isd.atr.co.jp/~eiji/swarm/HDF5samples.tar.gz
                                     ArchiverHDF5.hdf.gz
                                     testHDF5load.R

Plugin:

ftp://ftp.swarm.org/pub/swarm/src/testing/hdf5_0.9.tar.gz
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On 22 Mar 2001, Marcus G. Daniels wrote:
I'm using R 1.2.2 to read in large netCDF files.  I've read in about a tenth
of some 220MB netcdf files.  I've processed 82 of these files in a row without
restarting R and not had a problem with memory.  So obviously R can read in
large datasets, it must be a problem with the HDF module.  I know that the
netCDF 1.2 library is very inefficient at some things, and I've pretty much
totally re-written it.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._