Optimized rasterOptions() for a (virtually) infinite RAM machine
See Noam's post here for good advice, avoiding temp files is very important in your case: https://discuss.ropensci.org/t/how-to-avoid-space-hogging-raster-tempfiles/864 For using data frames raster's cell index abstraction is super powerful and sadly underused, see tabularaster for some easy approaches. Don't store coordinates explicitly, for example, at least not until you are ready to plot with ggplot2. Finally, raster us generally great with NetCDF if you let it control the task, but different situations and file setups can really matter so feel free to provide details if things aren't working well. Generally using raster can easily match the best you can achieve with the NetCDF API but lots of specifics can bite. Raster is generally not able to efficiently crop space and time together, for example but functions mapped to slice extraction can he used to hone performance. Cheers, Mike On Sat, 23 Sep 2017, 14:02 Thiago V. dos Santos via R-sig-Geo <
r-sig-geo at r-project.org> wrote:
Dear all,
I am using the raster package to process a total of 32 daily climate files
supplied as netcdf files. Each file is a raster brick with 100 rows x 95
cols x 54750 time slices and weighs about 1.5 GB.
Essentially, all the processing I am performing on each netcdf file is:
a) to subset a specific date rangeb) to extract values using points
After that, I just convert the extracted data to data.tables and keep
working in that format.
Since I extract data for about 450 points, and append all the data in a
huge data.table, I need to use a computer with as much RAM as possible.
I ended up using a spot instance on Amazon EC2. Using an instance with 32
cores and 244GB of RAM will cost me around $0.30/hour.
Since I will be charged per hour, I need to optimize my code to get my
results as fast as possible.
I don't even copy my data to the instance's hard disk; I send the files
directly to the ram disk (/dev/shm). Even using 48GB of ram disk to store
the files, I'll still have 196GB of RAM.
Under the scenario of having virtually infinite RAM memory, what would be
the best rasterOptions() to make sure I am processing all my rasters in
memory? Any other tips to benefit from such a large amount of RAM?
Thanks, -- Thiago V. dos Santos
Postdoctoral Research FellowDepartment of Climate and Space Science and
EngineeringUniversity of Michigan
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Dr. Michael Sumner Software and Database Engineer Australian Antarctic Division 203 Channel Highway Kingston Tasmania 7050 Australia [[alternative HTML version deleted]]