Interpretation of csplit from rpart.object
Your message *was* received, and you can check the archives to see it at https://stat.ethz.ch/pipermail/r-help/2005-September/077889.html You need to read the code to answer the question for yourself. There is lots of code interpreting csplit in the rpart package. These lines might be a clue, for example rpart.s: if (ncat>0) ans$csplit <- catmat +2 pred.rpart.s: as.integer(fit$csplit -2), summary.rpart.s: paste(c("L", "-", "R")[x$csplit[x$splits[i,4], 1:temp[i]]], The documentation is from the authors and may well be out of date: but you need to read much more carefully what it says (e.g. `this level').
On Wed, 21 Sep 2005 jmoreira at fe.up.pt wrote:
I send again this help message once previously was detected a virus. So, I don't
know if the R-list receive it. The virus problem is solved. Sorry for that.
----- Forwarded message from jmoreira at fe.up.pt -----
Date: Tue, 20 Sep 2005 14:35:12 +0100
From: jmoreira at fe.up.pt
Reply-To: jmoreira at fe.up.pt
Subject: Interpretation of csplit from rpart.object
To: r-help at stat.math.ethz.ch
Dear members of R-list,
I need to reproduce the rules of a decision tree. For that I need to use the
csplit information from the rpart.object. But I cannot uderstand the
information because from my example I get:
rpart.tree$csplit
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 3 3 1 3 3 3
[2,] 2 3 3 1 2 2 2
[3,] 1 3 3 1 3 3 3
[4,] 2 3 3 1 2 2 2
[5,] 2 3 3 1 2 2 2
[6,] 2 1 3 2 3 1 1
[7,] 2 3 3 2 3 3 1
[8,] 2 3 3 1 2 2 2
[9,] 2 1 3 2 3 1 1
[10,] 2 1 3 3 2 2 2
[11,] 2 1 1 2 1 1 3
[12,] 2 3 3 1 2 2 2
[13,] 2 1 1 2 3 1 1
[14,] 2 3 3 1 2 2 2
[15,] 2 1 3 2 1 1 1
[16,] 2 3 1 1 2 2 2
[17,] 2 3 3 1 2 2 2
[18,] 2 1 3 2 1 3 1
[19,] 2 3 3 1 2 2 2
[20,] 2 1 3 2 1 3 3
[21,] 2 3 1 2 2 2 2
[22,] 2 1 3 2 1 1 1
I don't understand why I have 22 rows (my tree has 21 nodes including the root
node) and 7 columns (I have four explanatory variables: two numerics and two
factors; plus the numeric target variable)
?rpart.object says:
csplit: this will be present only if one of the split variables is a
factor. There is one row for each such split, and column 'i =
-1' if this level of the factor goes to the left, '+1' if it
goes to the right, and 0 if that level is not present at this
node of the tree. For an ordered categorical variable all
levels are marked as 'R/L', including levels that are not
present.
The values I got are quite different.
Can some one give me information on how to deal with that?
Thanks in advance?
Joao Moreira
----- End forwarded message -----
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595