Skip to content

Chaid Decision Tree

2 messages · MIKE DE LA HOZ, Achim Zeileis

#
Hi,


I am running a chaid tree using titanic dataset (see attachment)



setwd("C:/Users/miguel")

titanic <- read.csv("train.csv")
titanic.s <- subset( titanic, select = -c(PassengerId, Name ) )

ctrl <- chaid_control(minsplit = 20, minbucket = 5, minprob = 0)
chaidTitanic <- chaid(Survived ~ ., data = titanic, control = ctrl)



It looks like I get the following error

Error: is.factor(x) is not TRUE



can you please help me here? I am not able to follow this type of error. if you can rewrite the sentence for me, It will be much appreciated


Thanks
#
On Mon, 22 Aug 2016, MIKE DE LA HOZ wrote:

            
To be able to apply the chaid() function all variables (both response and 
predictor) need to be categorical variables, i.e., in R of class "factor".

It is not clear which variables are the culprits here because your example 
is not reproducible. I guess that there are at least some numeric 
regressor variables. Maybe the "Survived" response is also in numeric 
dummy coding rather than the appropriate "factor" variable.

In any case, I would recommend to use a tree model that can deal with both 
kinds of regressor variables. If you want something that selections split 
variables and split points based on statistical tests, ctree() from 
package "partykit" would be the obvious candidate.