Skip to content
Prev 737 / 29559 Next

randomForests for mapping vegetation

Tom,
  We are using randomForests to generate predictive models (at 30 meters pixels, with a 22578 x 17160 grid) with about 36 environmental variables. I must admit we have not yet had the energy to do it completely within R or call R from ArcGIS.  
   We do our data prep and attributing in ArcGIS, then dump it all out to ASCII. We bring the attributed presence and absence points into R and build our random forest. With the known RF model, we then open connections to the ASCII grids for all 36 environmental layers, read them in piece by piece (each one is about 2GB), run a RF prediction on those pieces, and then write the prediction out to an ASCII file. 
  We then import the prediction layer into ArcGIS.  Not pretty by any means, but it does work.

Sincerely,
Tim





------------------------------

Message: 2
Date: Wed, 11 Jan 2006 11:24:51 +0100
From: "Edzer J. Pebesma" <e.pebesma at geog.uu.nl>
Subject: Re: [R-sig-Geo] randomForests for mapping vegetation
To: "Miewald, Tom" <TMiewald at sanborn.com>
Cc: r-sig-geo at stat.math.ethz.ch 
Message-ID: <43C4DCF3.1070705 at geog.uu.nl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Tom, a possibility is to stay in R and use rgdal.
rgdal can open raster maps (and I'm sure landsat images) directly,
and read in parts of them, i.e. it doesn't read the full
map at once. You'd have to loop over the full map, read
a part, predict with randomForest's predict method,
write the predicted values out, and go to the next part.

Support for sp classes is under development, in a
packages called spGDAL which is available in source
code on cvs from sourceforge:

export CVSROOT=:pserver:anonymous at cvs.sf.net:/cvsroot/r-spatial 
cvs co spGDAL

spGDAL has support for writing a gdal map, but I'm in
doubt whether it does support writing segments of
a gdal map. It should; please keep us updated on your mileage.
--
Edzer
Miewald, Tom wrote:

            
------------------------------

Message: 3
Date: Wed, 11 Jan 2006 11:28:19 +0100 (CET)
From: Roger Bivand <Roger.Bivand at nhh.no>
Subject: Re: [R-sig-Geo] randomForests for mapping vegetation
To: "Miewald, Tom" <TMiewald at sanborn.com>
Cc: r-sig-geo at stat.math.ethz.ch 
Message-ID: <Pine.LNX.4.44.0601111110230.30033-100000 at reclus.nhh.no>
Content-Type: TEXT/PLAIN; charset=US-ASCII
On Tue, 10 Jan 2006, Miewald, Tom wrote:

            
Have a look at:

http://www.ci.tuwien.ac.at/Conferences/DSC-2003/Drafts/FurlanelloEtAl.pdf 

which is fairly close to your description, although not such a large 
number of pixels, and does use R/GRASS integration.

My guess would be that you should tile the region for prediction into 
subregions, and patch them together back in AcrGIS. You could do it by 
writing out Arc ASCII grids using the write.asciigrid() function in 
the maptools package. If this is going to be more heavyweight production, 
then using the Rcom interface from VBA in ArcGIS might also be possible, 
if the whole process is going to have to be repeated many times. We are 
also looking at writing geotiffs from rdgal, so you should be able to find 
a suitable route from the subregional predictions within R back to rasters 
in ArcGIS. 

Examples using VBA are shown here:

http://perso.univ-lr.fr/csaintje/Recherche/RArcgis/index.html 

and a nice interface using Python from ArcGIS then Rcom:

http://www.nicholas.duke.edu/geospatial/software/ 

Most of the hard work will be in getting things to work once, from there 
it'll get easier. Consider save()ing the RF model output, so that to make 
predictions, you only need to load() then predict() for newdata (grids of 
RHS variables) for the current subregion. This also parallelises nicely, 
so you could also pass off subregions and the fitted model to slaves to do 
the predictions, but getting it working will take substantial time. By the 
way, OSX and Linux memory management will be better than Windows, so on 
Windows, go for smaller subregions.

Roger