Skip to content

Decision tree model using rpart ( classification

5 messages · aajit75, Tal Galili, Andrew Ziem

#
Hi Experts,

I am new to R, using decision tree model for getting segmentation rules.
A) Using behavioural data (attributes defining customer behaviour, ( example
balances, number of accounts etc.)
1. Clustering:  Cluster behavioural data to suitable number of clusters
2. Decision Tree: Using rpart classification tree for generating rules for
segmentation using cluster number(cluster id) as target variable and
variables from behavioural data as input variables.

B) Using profile data (customers  demographic data )
1. Clustering:  Cluster profile data to suitable number of clusters
2. Decision Tree: Using rpart classification tree for generating rules for
segmentation using cluster number(cluster id) as target variable and
variables from profile data as input variables.

C) Using profile data (customers  demographic data ) and deciles created
based on behaviour
1. Deciles:  Deciles customers to 10 groups based on some behavioural data
2. Decision Tree: Using rpart classification for generating rules for
segmentation using Deciles  as target variable and variables from profile
data as input variables.

In first two cases A and B decision tree model using rpart finish the
execution in a minute or two, But in third case (C) it continues to run for
infinite amount of time( monitored and running even after 14 hours).
 fit <- rpart(decile ~., method="class",    data=dtm_ip)
Is there anything wrong with my approach?

Thanks for the help in advance.
-Ajit


--
View this message in context: http://r.789695.n4.nabble.com/Decision-tree-model-using-rpart-classification-tp3989162p3989162.html
Sent from the R help mailing list archive at Nabble.com.
#
Hi,

Thanks for the responce, code for each case is as:

c_c_factor <- 0.001  
min_obs_split <- 80

A)

fit <- rpart(segment ~., method="class", 
	   control=rpart.control(minsplit=min_obs_split, cp=c_c_factor), 
	   data=Beh_cluster_out)

B)
fit <- rpart(segment ~., method="class", 
	   control=rpart.control(minsplit=min_obs_split, cp=c_c_factor), 
	   data=profile_cluster_out)

 C)
fit <- rpart(decile ~., method="class", 
	   control=rpart.control(minsplit=min_obs_split, cp=c_c_factor), 
	   data=dtm_ip)

In A and B target variable 'segment' is from the clustering data using same
set of input variables , while in C target variable 'decile' is derived from
behavioural variables and input variables are from profile data. Number of
rows in the input table in all three cases are same.

Regards,
-Ajit


--
View this message in context: http://r.789695.n4.nabble.com/Decision-tree-model-using-rpart-classification-tp3989162p3989320.html
Sent from the R help mailing list archive at Nabble.com.
#
aajit75 <aajit75 <at> yahoo.co.in> writes:
What is the value of modeling the deciles as the target? They are a lower
resolution version of information you already have, and without this model that
doesn't finish fitting you should already be able to assign a decile to every
customer.



Andrew