Skip to content
Prev 66687 / 398525 Next

regression tree xerror

Sherri Miller wrote:

            
rel error is estimated with the training data (the sample used for 
obtaining the tree) and thus it decreases as the tree increases, because 
the tree becomes more and more adjusted to the data. This apparently 
better performance should not be taken for "real" when predicting for a 
new sample of data because larger trees do tend to overfit the traning 
sample and will hardly generalise well on new fresh data samples.

That's the motivation for the xerror (and xstd) estimates. These are 
more realistic estimates of the performance of the tree on new samples 
of data. They are obtained by the rpart function by an internal cross 
validation process. The function prune() can be used to select a subtree 
of the tree obtained with rpart() if you think (by looking at the xerror 
estimates) you would be better off with this subtree.

Hope this helps.

Luis Torgo