Skip to content

how to read in R a big raster of about 6 Gb

7 messages · Massimo Bressan, Michael Sumner, Dominik Schneider

#
hi all


I need to perform a zonal statistics by overlapping a DEM raster (ESRI
grid binary) with some polygons (ESRI polygons);

the problem is that I first of all need to read the raster, which is
quite big: about 6 Gbyte;

I know about the function readGDAL() by the fantastic package "rgdal"
which is usually working very well with smaller size files but this time
I got completely stuck (the pc freezes invariably)

do you have any advice on how to deal with such a problem (big file)?

thank you

regards
#
You can use the raster package,  which will leave the values on disk if
they are too big for memory.
Use the function raster() to read the DEM, then use readOGR() to read the
polygon shape file. You can then use extract() to get statistics on your
DEM. I'm guessing it'll take a while to extract from disk, so if you are
planning to extract multiple times, you should consider copying your DEM
and setting the values equal to the cell numbers with values(DEM) <-
1:ncell(DEM).  Extract this once, keep track of which cell numbers are in
which polygon and then use the cell numbers to index the DEM when you need
values by polygon.
Hope this helps,
Dominik


On Thu, Oct 15, 2015 at 9:07 AM, Massimo Bressan <mbressan at arpa.veneto.it>
wrote:

  
  
#
hi, thanks for your prompt reply

in fact by using raster() funciontion "reading" DEM is quite fast and
also reading the polygons by readOGR() went quite smooth...

the problem then come with the extract() function that is taking definitely too
long...

I'm not quite sure I'm completely grasping your hint about
 values(DEM) <- 1:ncell(DEM)
for speeding up the process

please also consider that the SpatialPolygonDataFrame that I readOGR() has 23 fields;
I need finally to associate the DEM values with the levels represented by one of these fields:
do you think is a good idea eliminating the not necessary fields?

max
 


Il giorno Thu, 15/10/2015 alle 09.26 -0600, Dominik Schneider ha
scritto:
#
if you mean that you don't need information for all the polygons then yes,
you should subset the spatialpolygonsdF. If you're just concerned about
multiple attribute fields then I don't think it'll make a difference.

I'm not quite sure I'm completely grasping your hint about
DEMcopy=DEM
values(DEMcopy) <- 1:ncell(DEMcopy)
list_cells=extract(DEMcopy,sppolydF)

list_cells will be a list with the cell numbers associated with each
polygon in a separate list element.  The initial extract() will not take
less time, but it should be useful in the future, e.g. you could do
poly1=list_cells[[1]]
elev1=DEM[poly1]


Another option I've had luck with is rasterizing polygons. Using the GDAL
commandline utility gdal_rasterize you can convert the shape file of
polygons to a classified raster.   In R, you can then use zonal() with some
aggregation function to get stats of DEM using your classified raster.





On Thu, Oct 15, 2015 at 10:31 AM, Massimo Bressan <mbressan at arpa.veneto.it>
wrote:

  
  
#
Fwiw, the worst slow downs I see with extract are when the raster is
natively tiled.
(Been meaning to write the fix and/or report to Robert).

Can you check if your file is tiled? Use gdalinfo command line utility.

Extract, at least for points and I assume  also for polygons, scans the
file line by line but this needs to be done tile by tile to avoid
repetitive and wasteful I/O.

Cheers, Mike


On Friday, October 16, 2015, Dominik Schneider <
Dominik.Schneider at colorado.edu> wrote:
some
has

  
    
#
I do not understand exactly why but I had to pass through a preliminary
step by transforming the binary grid into an ascii grid format then read
it in R by means of readAsciiGrid(), package "maptools", and finally
convert it into a RasterLayer with the function raster() of the package
"rgdal"

after all that the extract() function was returning sensible results,
otherwise not at all (and I do not know why)

thank you all for your support

best regards

max

Il giorno Thu, 15/10/2015 alle 13.23 -0600, Dominik Schneider ha
scritto:
#
Michael, is the tiling issue also a problem if its tiled by layer? So for a
netcdf it is my understanding that extracting a layer is faster if the
tiles are nrow x ncol x 1 instead of something like nrow/2 x ncol/2 x 1
Domink
On Oct 16, 2015 01:42, "Michael Sumner" <mdsumner at gmail.com> wrote: