Skip to content
Prev 138824 / 398506 Next

Rpart and bagging - how is it done?

I believe that the procedure you describe at the end (resampling the 
cases) is the original interpretation of bagging, and that using weighting 
is equivalent when a procedure uses case weights.

If you are getting different results when replicating cases and when using 
weights then rpart is not using its weights strictly as case weights and 
it would be preferable to replicate cases.  But I am getting identical 
predictions by the two routes:

ind <- sample(1:81, replace=TRUE)
rpart(Kyphosis ~ Age + Number + Start, data=kyphosis[ind,], xval=0)
rpart(Kyphosis ~ Age + Number + Start, data=kyphosis,
       weights=tabulate(ind, nbins=81), xval=0)

My memory is that rpart uses unweighted numbers for its control params 
(unlike tree) and hence is not strictly using case weights.  I believe you 
can avoid that by setting the control params to their minimum and relying 
on pruning.

BTW, it is inaccurate to call these trees 'non-pruned' -- the default
setting of cp is still (potentially) doing quite a lot of pruning.

Torsten Hothorn can explain why he chose to do what he did.  There's a 
small (but only small) computational advantage in using case weights, but 
the tricky issue for me is how precisely tree growth is stopped, and I 
don't think that rpart at its default settings is mimicing what Breiman 
was doing (he would have been growing much larger trees).
On Thu, 6 Mar 2008, apjaworski at mmm.com wrote: