Skip to content
Prev 20798 / 29559 Next

issue in using projectRaster in raster library

Responses below:
On Thu, Apr 10, 2014 at 4:49 PM, ping yang <pingyang.whu at gmail.com> wrote:
GDAL 1.9.2 is pretty old (October 2012) -- I'd recommend upgrading to
1.10.1 -- I think they added the parallel support around v. 1.10.0.
OSGeo4W is my install of choice for Windows, but you can see the
various versions at:
http://trac.osgeo.org/gdal/wiki/DownloadingGdalBinaries  Also, you
need to use the -multi parameter as well, otherwise it will ignore the
-wo NUM_THREADS=ALL_CPUS.
This looks like a potential firewall problem -- but you also need to
register it with foreach -- this is a good question for r-sig-hpc, by
the way, since that is where the foreach people tend to hang out.  try
something like:

cl <- makeCluster(spec=4,type="PSOCK")
registerDoParallel(cl)

Keep in mind, if you don't see your CPUs being pegged it is because
you are (likely) I/O limited.

Incidentally, the "cleaner" way to use packages with foreach is using
the .packages= parameter rather than using require():
foreach(...,.packages=c("raster","rgdal","gdalUtils")) # You don't
need to require foreach -- it is autoloaded
The idea is for a lot of files, you can approach it in two ways:
1) Sequentially loop through each file, but parallelize the single
file processing (e.g. using -multi -wo NUM_THREADS=ALL_CPUS in GDAL)
-- when you see all the processors light up, this is processing ONE
file.
2) Parallel loop through your files -- each CPU ("worker"), then, is
processing a SINGLE file, but you are processing multiple files at the
same time.  This would be more like what you are trying to accomplish
above (using foreach).

Slope calculations are a form of focal-window analysis which can be
CPU limited, so I think #2 is the right way to go.  If you are trying
to cut down the time, you should look for bottlenecks -- are you
reading and writing to the same drive?  Are you doing it over a
network?  Things like that can really slow the process down.

--j