Difference between "tree" and "rpart"

Wed, May 4, 2005 6:54 AM #

In the help for rpart it says, "This differs from the tree function
mainly in its handling of surrogate variables." And it says that an
rpart object is a superset of a tree object. Both cite Brieman et al.
1984. Both call external code which looks like martian poetry to me.

I've seen posts in the archives where BDR, and other knowledgeable
folks, have said that rpart() is to be prefered over tree()

Is there a simple reason why? They use the same fundamental algorithm.
Are there differences in processing time? Bells and whistles?

TIA, DRC

Brian Ripley

Wed, May 4, 2005 9:04 AM #

rpart does much more at C level, including pruning and cross-validation so 
can be much faster.

It is also user-extensible.

tree was actually written to track down bugs in the then S implementation, 
and so is much closer to the functionality in S.  It is not where I would 
have started from.  It is really only available for R to support MASS and 
PRNN (my books).

On Wed, 4 May 2005, Dr Carbon wrote:

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595