help with categorical responses in boosted classification trees (gbm package)
On Fri, Oct 10, 2008 at 10:41 AM, Jill Johnstone
<jill.johnstone at usask.ca> wrote:
Hello, I am working on developing code for a boosted classification tree that predicts membership within 4 non-ordered classes, using the gbm or gbmplus packages in R. I've been successful (I think) in using this package successfully for regression trees, where the response is numeric. However, I'm running into problems setting up a boosted tree for a categorical response that is not simply a 0,1 response. In my case, the response is a non-ordered factor that represents different vegetation community types.
Are you sure that gbm is designed to handle multi-class responses (i.e. >2 levels)?
There are 4 factor levels and n=90 for the dataset.
90 observations may well not provide enough information to predict four response levels (depending on the strengths of the relationships, the number of observed successes, etc.)
I think the problem may be that I am not specifying a proper error distribution. GBM help specifies the following options for this: "..."gaussian" (squared error), "laplace" (absolute loss), "bernoulli" (logistic regression for 0-1 outcomes), "adaboost" (the AdaBoost exponential loss for 0-1 outcomes), "poisson" (count outcomes), and "coxph" (censored observations)." I believe that the Gaussian error distribution is most appropriate for these data, and this is what I've been using. Below is the code that I am running:
Given the categorical response, squared error loss would not be a good choice. If gbm is designed to handle multi-category responses, the 'bernoulli or 'adaboost' options are more appropriate.
tree1 <- gbm(veg ~ lat+elev+moist.class+BA.stnd+pre.decid,
data = natseed, n.tree=900, int=3, n.minobsinnode=5,
distribution="gaussian", shrinkage=0.003,
bag.fraction=0.5, cv.folds=5)
all.summary(tree.1)
And the error I am currently getting specifies a problem with the
cross-validation, but I am not sure how to interpret this:
"Error in if (x[[1]]$type != "cv") stop("Not a CV tree !!\n") : argument is
of length zero"
I'd really appreciate suggestions about where I might be going wrong, if
anyone has any. I've been able to run this successfully as a regular
classification tree using the "tree" library, but had hoped to apply the
boosting approach. I've been referring to two excellent ecological papers
that describe this technique, but neither deals with this type of
classification tree:
Have a look at the randomForest package. Also the following Google search may help: boosting "multi-class" OR "k-class" OR multicategorical OR multinomial hth, Kingsford Jones
1. De'ath, G. 2007. Boosted trees for ecological modeling and prediction. Ecology 88: 243-251. 2. Elith, J., Leathwick, J.R., and Hastie, T. 2008. A working guide to boosted regression trees. J. Animal Ecol. 77: 802-813. Thanks in advance for any suggestions. Jill Johnstone assistant professor Department of Biology University of Saskatchewan 112 Science Place Saskatoon SK S7N 5E2 ph:(306)966-4421 fax:966-4461 website: www.usask.ca/biology/johnstone/
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology