Skip to content

randomForests for mapping vegetation

3 messages · Miewald, Tom, Edzer Pebesma, Roger Bivand

#
Hello all,

I am new to this list and wondering whether anyone has any experience (or ideas) for how to implement vegetation mapping using the randomForests package from R.  The model produced from randomForests would be used to map vegetation from Landsat (30 x 30 meter pixels) for relatively large areas (> 10 million hectares, so a lot of pixels).  There are ~15 explanatory data sets (imagery, dems,precip, etc).  My main question concerns how to use the output from randomForests to predict vegetation over such an area.  I have seen some literature out there using GRASS.  I would rather not go down that road because I already have enough software packages.  Is there any possibilities for using ArcGIS connectivity to enable the prediction of vegetation?  Any input would be appreciated.  Thanks!
Tom

CONFIDENTIALITY AND DISCLAIMER: This message and any attachments hereto are intended only for the use of the addressee(s) and may be legally privileged and/or confidential. Any dissemination, distribution, printing, forwarding, or any method of copying of this message or any attachment hereto, and/or the taking of any action in reliance on the information herein or in any attachment hereto is strictly prohibited except by the original intended recipient. If you have received this communication in error, please immediately notify the sender, and permanently delete this message and any attachment hereto from your computer or storage system, and destroy any printout thereof. Although reasonable precautions have been taken to ensure no viruses are present in this message or any attachment hereto, The Sanborn Map Company, Inc. takes no responsibility and has no liability for any virus which may be transferred via this message or any attachment hereto.

(svr28)
#
Tom, a possibility is to stay in R and use rgdal.
rgdal can open raster maps (and I'm sure landsat images) directly,
and read in parts of them, i.e. it doesn't read the full
map at once. You'd have to loop over the full map, read
a part, predict with randomForest's predict method,
write the predicted values out, and go to the next part.

Support for sp classes is under development, in a
packages called spGDAL which is available in source
code on cvs from sourceforge:

export CVSROOT=:pserver:anonymous at cvs.sf.net:/cvsroot/r-spatial
cvs co spGDAL

spGDAL has support for writing a gdal map, but I'm in
doubt whether it does support writing segments of
a gdal map. It should; please keep us updated on your mileage.
--
Edzer
Miewald, Tom wrote:

            
#
On Tue, 10 Jan 2006, Miewald, Tom wrote:

            
Have a look at:

http://www.ci.tuwien.ac.at/Conferences/DSC-2003/Drafts/FurlanelloEtAl.pdf

which is fairly close to your description, although not such a large 
number of pixels, and does use R/GRASS integration.

My guess would be that you should tile the region for prediction into 
subregions, and patch them together back in AcrGIS. You could do it by 
writing out Arc ASCII grids using the write.asciigrid() function in 
the maptools package. If this is going to be more heavyweight production, 
then using the Rcom interface from VBA in ArcGIS might also be possible, 
if the whole process is going to have to be repeated many times. We are 
also looking at writing geotiffs from rdgal, so you should be able to find 
a suitable route from the subregional predictions within R back to rasters 
in ArcGIS. 

Examples using VBA are shown here:

http://perso.univ-lr.fr/csaintje/Recherche/RArcgis/index.html

and a nice interface using Python from ArcGIS then Rcom:

http://www.nicholas.duke.edu/geospatial/software/

Most of the hard work will be in getting things to work once, from there 
it'll get easier. Consider save()ing the RF model output, so that to make 
predictions, you only need to load() then predict() for newdata (grids of 
RHS variables) for the current subregion. This also parallelises nicely, 
so you could also pass off subregions and the fitted model to slaves to do 
the predictions, but getting it working will take substantial time. By the 
way, OSX and Linux memory management will be better than Windows, so on 
Windows, go for smaller subregions.

Roger