Hi,
First, thanks to those who helped me see my gross misunderstanding of
randomForest. I worked through a baging tutorial and now understand the
"many tree" approach. However, it is not what I want to do! My bagged
errors are accpetable but I need to use the actual tree and need a single
tree application.
I am using rpart for a classification tree but am interested in a more
unbaised estimator of error in my tree. I lack sufficent data to train
and test the tree and I'm hoping to bootstrap, or rather jacknife, an
error estimate.
I do not think the rpart.object can be applied to the jackknife function
in bootstrap but can I do something as simple as:
for(i in 1:number of samples){
remove i from the data
run the tree
compare sample[i] to the tree using predict
create an error matrix}
This would give me a confussion matrix of data not included in the tree's
constuction.
Am I being obtuse again?
Thanks, CM
Jackknife and rpart
2 messages · chumpmonkey@hushmail.com, Frank E Harrell Jr
On Wed, 16 Apr 2003 10:28:08 -0700
chumpmonkey at hushmail.com wrote:
Hi,
First, thanks to those who helped me see my gross misunderstanding of
randomForest. I worked through a baging tutorial and now understand the
"many tree" approach. However, it is not what I want to do! My bagged
errors are accpetable but I need to use the actual tree and need a single
tree application.
I am using rpart for a classification tree but am interested in a more
unbaised estimator of error in my tree. I lack sufficent data to train
and test the tree and I'm hoping to bootstrap, or rather jacknife, an
error estimate.
I do not think the rpart.object can be applied to the jackknife function
in bootstrap but can I do something as simple as:
for(i in 1:number of samples){
remove i from the data
run the tree
compare sample[i] to the tree using predict
create an error matrix}
This would give me a confussion matrix of data not included in the tree's
constuction.
Am I being obtuse again?
Thanks, CM
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
You might look at the validate.tree function in the Design library (http://hesweb1.med.virginia.edu/biostat/s/Design.html) but better validated predictive accuracy would be obtained by approximating the predictions from the randomForest by a single (moderately large) tree. You can use rpart to develop such a tree, stopping when, for example, the R-square is 0.9 or 0.95. --- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat