Hi Experts, I am new to R, using decision tree model for getting segmentation rules. A) Using behavioural data (attributes defining customer behaviour, ( example balances, number of accounts etc.) 1. Clustering: Cluster behavioural data to suitable number of clusters 2. Decision Tree: Using rpart classification tree for generating rules for segmentation using cluster number(cluster id) as target variable and variables from behavioural data as input variables. B) Using profile data (customers demographic data ) 1. Clustering: Cluster profile data to suitable number of clusters 2. Decision Tree: Using rpart classification tree for generating rules for segmentation using cluster number(cluster id) as target variable and variables from profile data as input variables. C) Using profile data (customers demographic data ) and deciles created based on behaviour 1. Deciles: Deciles customers to 10 groups based on some behavioural data 2. Decision Tree: Using rpart classification for generating rules for segmentation using Deciles as target variable and variables from profile data as input variables. In first two cases A and B decision tree model using rpart finish the execution in a minute or two, But in third case (C) it continues to run for infinite amount of time( monitored and running even after 14 hours). fit <- rpart(decile ~., method="class", data=dtm_ip) Is there anything wrong with my approach? Thanks for the help in advance. -Ajit -- View this message in context: http://r.789695.n4.nabble.com/Decision-tree-model-using-rpart-classification-tp3989162p3989162.html Sent from the R help mailing list archive at Nabble.com.
Decision tree model using rpart ( classification
5 messages · aajit75, Tal Galili, Andrew Ziem
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111104/4cafca91/attachment.pl>
Hi, Thanks for the responce, code for each case is as: c_c_factor <- 0.001 min_obs_split <- 80 A) fit <- rpart(segment ~., method="class", control=rpart.control(minsplit=min_obs_split, cp=c_c_factor), data=Beh_cluster_out) B) fit <- rpart(segment ~., method="class", control=rpart.control(minsplit=min_obs_split, cp=c_c_factor), data=profile_cluster_out) C) fit <- rpart(decile ~., method="class", control=rpart.control(minsplit=min_obs_split, cp=c_c_factor), data=dtm_ip) In A and B target variable 'segment' is from the clustering data using same set of input variables , while in C target variable 'decile' is derived from behavioural variables and input variables are from profile data. Number of rows in the input table in all three cases are same. Regards, -Ajit -- View this message in context: http://r.789695.n4.nabble.com/Decision-tree-model-using-rpart-classification-tp3989162p3989320.html Sent from the R help mailing list archive at Nabble.com.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111104/be8bb88e/attachment.pl>
aajit75 <aajit75 <at> yahoo.co.in> writes:
fit <- rpart(decile ~., method="class", control=rpart.control(minsplit=min_obs_split, cp=c_c_factor), data=dtm_ip) In A and B target variable 'segment' is from the clustering data using same set of input variables , while in C target variable 'decile' is derived from behavioural variables and input variables are from profile data. Number of rows in the input table in all three cases are same.
What is the value of modeling the deciles as the target? They are a lower resolution version of information you already have, and without this model that doesn't finish fitting you should already be able to assign a decile to every customer. Andrew