Tiled processing... - R-SIG-Geo

Wed, Oct 29, 2008 6:44 PM #

I've recently got back into using R to perform spatial analyses, and I'm 
trying to figure out how to perform "true" tiled processing, e.g. 
controlled reading of subsets of an input file, performing a function on 
this subset, and writing the output, subset by subset, to an output file 
and, finally, setting up the appropriate "header" info (the metadata).

Using suggestions Tim Keitt and Roger Bivand gave me some years back, I 
wrote this simple code (all it should do is take an elevation geotiff 
and add 1000 to the elevations, writing the output):

***

library(rgdal)
ds1 <- GDAL.open("elev.tif")
driver <- new('GDALDriver', 'GTiff')
ds2 <- new('GDALTransientDataset', driver, nrow(ds1), ncol(ds1), type = 
"Float32")
  for (row in nrow(ds1)) {
    row
    x <- getRasterData(ds1, offset = c(row - 1, 0), region = c(1, 
ncol(ds1)))
    elevnew <- x+1000
    putRasterData(ds2, elevnew, offset = c(row - 1, 0))

  }
saveDataset(ds2, "out.tif")
closeDataset(ds2)
closeDataset(ds1)

***

Couple of issues/questions:
1) The code doesn't actually seem to work -- I get a tiny, unreadable 
TIFF out of the back of this algorithm -- what is wrong?
2) I'm actually unclear about exactly what this is doing -- am I really 
just creating a large in-memory matrix (ds2) and then writing the entire 
output image out to disk, or is this somehow writing line-by-line?  If 
the former is true, how do I modify this to write within the loop, 
rather than all-at-once?

Thanks!

--j

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Cell: 415-794-5043
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307

Ashton Shortridge

Thu, Oct 30, 2008 6:46 AM #

Hi Jonathan,

I use GDAL to read subsets of raster and image files, though usually I employ 
the python bindings. I don't actually have answers, but maybe ideas about how 
to get there...

I don't know; however, you can check to see that you are actually reading the 
data that you think you are. Insert a line like print(x) or print(elevnew) in 
the middle. If that works, then the problem is in how you are writing it.

Yes, that is what your code seems to be doing - you are writing each line to 
an object in memory and then saving it outside of the loop as a tiff. In lots 
of batch processing you can add to a file incrementally (and therefore work 
with really large files in a memory-limited environment), but from my limited 
perspective, it doesn't appear that this is practical with gdal.

Your code therefore is not saving you lots of memory. It is saving you some - 
you don't have the entire input image in memory, just a row at a time, but 
the output is sitting there the whole time.

Another possibility might be to write small tiffs within the loop and then 
merge the tiffs, possibly (probably) outside of R, e.g. in a batch file 
employing command-line gdal.

Hope this helps, 

Ashton

On Wednesday 29 October 2008 09:44:51 pm Jonathan Greenberg wrote:

Ashton Shortridge
Associate Professor			ashton at msu.edu
Dept of Geography			http://www.msu.edu/~ashton
235 Geography Building		ph (517) 432-3561
Michigan State University		fx (517) 432-1671

Alexander Brenning

Thu, Oct 30, 2008 7:37 AM #

Hi,

maybe the RSAGA package has the solution to your problem; there are 
actually two ways of applying functions to grids in RSAGA, (1) by 
row-by-row processing (special form of tiles), or (2) using the SAGA 
binaries.

1) local.function

local.function and focal.function are very flexible tools, but they are 
slow because they are written in R. They work with ASCII grids (see e.g. 
write.ascii.grid in RSAGA, or the appropriate GDAL export functions that 
work with your GeoTIFFs).

In your case:

local.function("ingrid", varnames = "outgrid",
      fun = function(x) x + 1000)

(or use focal.function with radius = 0). This will of course work with 
much more general R functions (even with predict methods if you look at 
grid.predict and multi.focal.function).

The RSAGA package technically depends on Windows (because most of its 
functions use SAGA GIS Windows binaries) but the local.function and 
focal.function are (supposed to be) platform-independent; I can provide 
you with the source code if you work on a non-Windows system and can't 
get it from CRAN.

2) rsaga.grid.calculus

Another approach, which involves SAGA itself and therefore (currently) 
depends on Windows, uses SAGA's grid calculator. SAGA is able to process 
large grids efficiently. E.g.:

# first convert the ASCII grid to a SAGA grid (.sgrd):
rsaga.esri.to.sgrd("ingrid")
rsaga.grid.calculus("ingrid", "outgrid", formula = "a+1000")
   # 'ingrid' is treated as 'a' in the formula
rsaga.sgrd.to.esri("outgrid", prec = 2)

I hope this helps...

Cheers
  Alex

Jonathan Greenberg wrote:

Alexander Brenning
brenning at uwaterloo.ca - T +1-519-888-4567 ext 35783
Department of Geography and Environmental Management
University of Waterloo
200 University Ave. W - Waterloo, ON - Canada N2L 3G1
http://www.fes.uwaterloo.ca/geography/faculty/brenning/

Jonathan Greenberg

Thu, Oct 30, 2008 12:17 PM #

Ok, I'm nearly there using only RGDAL and R-base commands, if I can get 
a bit of feedback I might have a decent tiled (row-by-row) processing 
structure worked up.  Couple of things -- first, I was mistakenly 
thinking all raster formats can even really support line-by-line 
writing, which is not neccessarily true (consider the various compressed 
image formats).  Let's assume that the user just wants a flat-binary 
type file, either ENVI or ESRI format.  We can use writebin and one of 
the writeGDAL(...,drivername='EHdr',...) or similar "header-only" write 
commands:

elev is a DEM, but it can be any raster format GDAL can read.

***

library(rgdal)
infile='elev'
outfile_base='testout'
outfile_ext='.bil'
outfile=paste(outfile_base,outfile_ext,sep='')
outcon <- file(outfile, "wb")

infile_info=GDALinfo(infile)
nl=infile_info[[1]]
ns=infile_info[[2]]

for (row in 1:nl) {
    templine <- readGDAL(infile,region.dim=c(1,ns),offset=c(row-1,0))
    writeBin(templine[[1]], outcon,size=4)
}
close(outcon)
writeGDAL(templine,outfile_base,drivername='EHdr',type="Float32")

***

Right now, this ALMOST works except, as you can see from the final line 
that the output header will incorrectly set the number of lines in the 
output to 1 (because I'm only reading one line at a time).  I'm using 
templine purely as a way to carry over the header info, it won't write 
any actual data out, as far as I know (or will it?)  How do I modify the 
"metadata" of the templine to reflect the correct "header" info (e.g. 
set the number of rows back to the total number of rows in the image, 
and reset the geographic position correctly.

--j

Alexander Brenning wrote:

Hi,

maybe the RSAGA package has the solution to your problem; there are 
actually two ways of applying functions to grids in RSAGA, (1) by 
row-by-row processing (special form of tiles), or (2) using the SAGA 
binaries.

1) local.function

local.function and focal.function are very flexible tools, but they are 
slow because they are written in R. They work with ASCII grids (see e.g. 
write.ascii.grid in RSAGA, or the appropriate GDAL export functions that 
work with your GeoTIFFs).

In your case:

local.function("ingrid", varnames = "outgrid",
     fun = function(x) x + 1000)

(or use focal.function with radius = 0). This will of course work with 
much more general R functions (even with predict methods if you look at 
grid.predict and multi.focal.function).

The RSAGA package technically depends on Windows (because most of its 
functions use SAGA GIS Windows binaries) but the local.function and 
focal.function are (supposed to be) platform-independent; I can provide 
you with the source code if you work on a non-Windows system and can't 
get it from CRAN.

2) rsaga.grid.calculus

Another approach, which involves SAGA itself and therefore (currently) 
depends on Windows, uses SAGA's grid calculator. SAGA is able to process 
large grids efficiently. E.g.:

# first convert the ASCII grid to a SAGA grid (.sgrd):
rsaga.esri.to.sgrd("ingrid")
rsaga.grid.calculus("ingrid", "outgrid", formula = "a+1000")
  # 'ingrid' is treated as 'a' in the formula
rsaga.sgrd.to.esri("outgrid", prec = 2)

I hope this helps...

Cheers
 Alex


Jonathan Greenberg wrote:

I've recently got back into using R to perform spatial analyses, and 
I'm trying to figure out how to perform "true" tiled processing, e.g. 
controlled reading of subsets of an input file, performing a function 
on this subset, and writing the output, subset by subset, to an output 
file and, finally, setting up the appropriate "header" info (the 
metadata).

Using suggestions Tim Keitt and Roger Bivand gave me some years back, 
I wrote this simple code (all it should do is take an elevation 
geotiff and add 1000 to the elevations, writing the output):

***

library(rgdal)
ds1 <- GDAL.open("elev.tif")
driver <- new('GDALDriver', 'GTiff')
ds2 <- new('GDALTransientDataset', driver, nrow(ds1), ncol(ds1), type 
= "Float32")
 for (row in nrow(ds1)) {
   row
   x <- getRasterData(ds1, offset = c(row - 1, 0), region = c(1, 
ncol(ds1)))
   elevnew <- x+1000
   putRasterData(ds2, elevnew, offset = c(row - 1, 0))

 }
saveDataset(ds2, "out.tif")
closeDataset(ds2)
closeDataset(ds1)

***

Couple of issues/questions:
1) The code doesn't actually seem to work -- I get a tiny, unreadable 
TIFF out of the back of this algorithm -- what is wrong?
2) I'm actually unclear about exactly what this is doing -- am I 
really just creating a large in-memory matrix (ds2) and then writing 
the entire output image out to disk, or is this somehow writing 
line-by-line?  If the former is true, how do I modify this to write 
within the loop, rather than all-at-once?

Thanks!

--j

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Cell: 415-794-5043
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307