Skip to content

Parallel predict now in spatial.tools

4 messages · Tim Howard, Jonathan Greenberg

#
Jonathan,
Thanks again for your reply. 
I'll reply inline ... please see below
Yes, I think I'm good on layer names, as well as maintaining all the
levels in categorical data (a classic gotcha for RF).
I was a little reluctant to pass on the error message until I knew a
little more. The main reason being that the normal predict function with
raster *does* work even though you might not expect it with the error
message. Here is the error coming out of predict_rasterEngine:
socket cluster with 6 nodes on host ?localhost?
newdata=envStack, newtype='prob')
Error in predict.randomForest(object = object, newdata = newdata_df,
mget(model_parameters)) : 
  missing values in newdata
Again, I can run these exact data using raster:::predict and I get good
output.  To me that means I do not have 'missing values in newdata'. I
welcome thoughts otherwise, however. The predict call is structured like
this:

prediction_rf_prob <- predict(object=envStack, model=rf.full,
type='prob', progress = 'text')

So, I've tried recreating the problem by tweaking the Tahoe data. Below,
I've added a categorical layer to the brick. Unfortunately, it errors
out with a new error. (Null external pointer).  (Am I right that
raster:::predict and predict_rasterEngine use the term "object"
differently?)

I'll paste a replacement set of code for the Tahoe data below our
string.

I welcome comments on any of this!

Cheers, 
Tim
<tghoward at gw.dec.state.ny.us> wrote:
and y
barking up
subset
piece
the
<tghoward at gw.dec.state.ny.us>
two
the
I
the
the
making
example
work
predict
alternate
features:
vector/matrix.
model,
layers
fairly
package="spatial.tools")))
randomForest(SPECIES~tahoe_highrez.1+tahoe_highrez.2+tahoe_highrez.3,
predict_rasterEngine(object=tahoe_rf,newdata=tahoe_highrez,type="response")
predict_rasterEngine(object=tahoe_rf,newdata=tahoe_highrez,type="prob")
jgrn3007

#here is a script that adds a categorical layer to the tahoe brick,
creates a RF model
#and then tries both raster:::predict and  predict_rasterEngine.

packages_required <- c("spatial.tools","doParallel","randomForest")
lapply(packages_required, require, character.only=T)

# Load up a 3-band image:
tahoe_highrez <- setMinMax(
brick(system.file("external/tah#create a categorical layer from band 1
mat <- matrix(c(0,50,1,50,150,2,150,255,3),ncol=3,byrow=TRUE)
bnd1cat <- reclassify(tahoe_highrez[[1]], rcl=mat)
bnd1cat <- ratify(bnd1cat)
rat <- levels(bnd1cat)[[1]]
rat$types <- c('type1', 'type2', 'type3', 'type4')
rat$code <- c(1,2,3,4)
levels(bnd1cat) <- rat

#library(rasterVis) #if you want a categorical plot
#levelplot(bnd1_rc2)

tahoe_highrez <- addLayer(tahoe_highrez, bnd1cat)
names(tahoe_highrez) <-
c("tahoeOne","tahoeTwo","tahoeThree","tahoeCatFour")

# Load up some training points:
tahoe_highrez_training_points <- readOGR(
    dsn=system.file("external", package="spatial.tools"),
    layer="tahoe_highrez_training_points")

# Extract data to train the randomForest model:
tahoe_highrez_training_extract <-
extract(tahoe_highrez,tahoe_highrez_training_points,df=TRUE)

# Fuse it back with the SPECIES info:
tahoe_highrez_training_extract$SPECIES <-
tahoe_highrez_training_points$SPECIES

# Note the names of the bands:
names(tahoe_highrez_training_extract) # the extracted data
names(tahoe_highrez) # the brick

# convert to factor, ensure all the levels are there
tahoe_highrez_training_extract$tahoeCatFour <-
factor(tahoe_highrez_training_extract$tahoeCatFour)
levels(tahoe_highrez_training_extract$tahoeCatFour) <- c(1,2,3,4)
str(tahoe_highrez_training_extract)

# Generate a randomForest model:
tahoe_rf <- randomForest(y=tahoe_highrez_training_extract$SPECIES, 
                            x=tahoe_highrez_training_extract[,2:5],
                            data=tahoe_highrez_training_extract)


# try it with standard predict call -- this works                       
    
predict1_rf_prob <- predict(object=tahoe_highrez, model=tahoe_rf,
type="prob")

# try it with rasterEngine -- this fails
sfQuickInit()
predict2_rf_prob <-
predict_rasterEngine(object=tahoe_rf,newdata=tahoe_highrez,type="prob")
sfQuickStop()

plot(predict1_rf_prob)




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20140328/5bb48236/attachment.html>
#
Tim:

I notice you have:

predict_rasterEngine(object=rf.full, newdata=envStack, newtype='prob')

Was that a typo in your email? If not, this might be your issue -- the
correct parameter is "type", not "newtype" -- I noticed you are using
"type" in the raster:::predict:

?predict.randomForest

If you still get that error when you use type="prob", I think the
issue is that the raster probably has NA values (that is what usually
triggers the "missing values in newdata" error for
predict.randomForest) -- this is a valid issue, so I'll need to fix
the code a bit to deal with it -- can you confirm if its the newtype
issue first?  I can probably push out a fix later this weekend if it
is, indeed, the NA issue.

Another important issue is that I don't, at present, support the RAT,
but I can also work on a fix for that.

You are correct re: the object is the model, and the newdata parameter
is the raster/brick/stack.

Cheers!

--j
On Fri, Mar 28, 2014 at 8:58 AM, Tim Howard <tghoward at gw.dec.state.ny.us> wrote:

  
    
#
Yes, I caught that typo after sending the email. The same happens with "type" rather than "newtype". Sorry. 

Yes, absolutely, there are NA values in the newdata.  My area of interest isn't rectangular and so the edges of all layers have NA values. Yes, it is definitely more efficient to skip all rows with any NA cells before sending it into RF. It is pretty critical to handle that for increased speed. 

I don't use the RAT either ... my first exposure to it was this morning trying to create a clearly categorical layer for the Tahoe data set. That's all. But I do have categorical data that are handled appropriately (without a RAT) in my data flow. I was just trying to add categorical data to the data flow in case that was what was  throwing your function.  It sounds like that issue might be moot.  That's what I get for guessing!

Thanks for your quick reply. 
Cheers, 
Tim
Tim:

I notice you have:

predict_rasterEngine(object=rf.full, newdata=envStack, newtype='prob')

Was that a typo in your email? If not, this might be your issue -- the
correct parameter is "type", not "newtype" -- I noticed you are using
"type" in the raster:::predict:

?predict.randomForest

If you still get that error when you use type="prob", I think the
issue is that the raster probably has NA values (that is what usually
triggers the "missing values in newdata" error for
predict.randomForest) -- this is a valid issue, so I'll need to fix
the code a bit to deal with it -- can you confirm if its the newtype
issue first?  I can probably push out a fix later this weekend if it
is, indeed, the NA issue.

Another important issue is that I don't, at present, support the RAT,
but I can also work on a fix for that.

You are correct re: the object is the model, and the newdata parameter
is the raster/brick/stack.

Cheers!

--j
On Fri, Mar 28, 2014 at 8:58 AM, Tim Howard <tghoward at gw.dec.state.ny.us> wrote:

  
    
#
Tim:

Would you mind crop()'ing out a piece of your stack and also save()
the randomforest model so I can do some tests?  you can email them to
me directly or place them in a shared location.  Thanks!

--j
On Fri, Mar 28, 2014 at 9:54 AM, Tim Howard <tghoward at gw.dec.state.ny.us> wrote: