rpart vs. randomForest

One of these days I promise to write a package vignette...

As Martin said, RF uses many trees (500 by default).  The "forest" component
of the randomForest object contains all the trees, but not in a easily
readable form (because I don't see much use in "looking" at the trees except
for debugging purposes).  If you really want to see what a tree look like,
grow just one tree and look at the "forest" component.  Here are some
explanation:

For each tree: 
o  "nrnodes" is the maxinum number of nodes a tree can have.  

o  "ndbigtree" is a vector of length ntree containing the total number of
nodes in the trees.

o  "nodestatus" is a nrnodes by ntree matrix of indicators: -1 if the node
is terminal.

o  "treemap" a 3-D array, containing a two-column matrix for each tree.  The
first column indicate which node is the "left decendent" and the second
column the "right decendent".  Both are 0 if the node is terminal.

o  "bestvar" is a nrnodes by ntree matrix that indicate, for each node,
which variable is used to split that node.  0 for terminal nodes.

o  "xbestsplit" is the same as "bestvar", except it tells where to split.

One thing people should keep in mind about the "predicted" component of the
randomForest object (and the confusion matrix for the training data), as
well as "predict(rf.object)" without giving the newdata for prediction:
That prediction is based on Out-of-Bag samples, so is *NOT* the same as
usual prediction on training data.  It is closer to the out-of-sample
prediction as in, e.g., cross-validation.

AFAIK there are only empirical and anecdotal evidence on sensitivity of
performance to value of mtry.  I can say that in my own experience, fiddling
with mtry will only give at best marginal improvement.  One easy way to
answer the question for your situation is to try it yourself and see.

With MDS on proximity matrix, you probably need to be a bit careful in its
interpretation.  The proximity matrix of the training data is computed on
the *entire* training data, rather than just the out of bag portion.  Thus
the MDS plot will quite often show the different classes that look more
"separable" than they really are.  (We are thinking about a fix.  Breiman
pointed out that the difficulty is that if the proximity matrix is
calculated only on the out-of-bag data, than 1-proximity is no longer
positive definite).

HTH,
Andy