no splits possible - in mvpart
Hi Mike, You need to carefully read the help for mvpart, rpart, and rpart.control - this is a complex procedure and there are a lot of possible options and ways to screw up. cp is the complexity parameter - a proposed split must be as good or better than cp to even be considered. If you aren't getting any splits, then none of the splits possible in your data are good enough at that level. More formally, from the help for rpart.control: cp complexity parameter. Any split that does not decrease the overall lack of ?t by a factor of cp is not attempted. For instance, with anova splitting, this means that the overall Rsquare must increase by cp at each step. The main role of this parameter is to save computing time by pruning off splits that are obviously not worthwhile. Essentially,the user informs the program that any split which does not improve the ?t by cp will likely be pruned off by cross-validation, and that hence the program need not pursue it. I believe that classification trees (not clustering) as implemented in R are covered in some detail in MASS; you should probably also find and read that. Sarah
On Fri, Jan 28, 2011 at 8:31 PM, Mike Marsh <swamp at blarg.net> wrote:
I am clustering vegetation richness (0 or 1) data that is segregated by growth form, i.e. Shrub, Annual Grass, Perennial Grass, etc., using mvpart for comparison with clustering by hclust. The environmental file has four variables, Slope, Elevation, heatload, and Ecological Site (a measure of soil and land form type). When four of the six data files are analyzed, a split is successful when raw data are analyzed, but a message, "No splits possible -- try decreasing cp" appears when data standardized by "scaler" are submitted. My question: What does the message mean? How would I decrease cp. I have re-read De'Ath, 2002 (Ecology 83:1105) regarding cross-validation, and I assume that xerror in the table produced by printcp is that quantity. In the present instance, there are only two leaves to the tree, and further reduction of cp would seem impossible A further puzzle is that when the smallest dataset (not included in this analysis), with only 6 columns, is analyzed, a result is obtained for standardized data. The Shrub data resented here as an example, have 27 columns, the Annual.Forb data, 35 columns. Here is my script, with output:
set.seed(1) Shrub.mrt<-mvpart(Shrub~.,Qenv) printcp(Shrub.mrt)
mvpart(form = Shrub ~ ., data = Qenv) Variables actually used in tree construction: [1] Alt.E Root node error: 69.727/22 = 3.1694 n= 22 ? ? ? CP nsplit rel error xerror ? ?xstd 1 0.23477 ? ? ?0 ? 1.00000 1.1064 0.09480 2 0.12882 ? ? ?1 ? 0.76523 1.0372 0.10470
Shrub.std<- scaler(Shrub, col="mean1", row="mean1") Shrub.std.mrt<-mvpart(Shrub.std~.,Qenv)
No splits possible -- try decreasing cp
printcp(Shrub.std.mrt)
rpart(formula = form, data = data) Variables actually used in tree construction: character(0) Root node error: 0/0 = NaN n=0 (22 observations deleted due to missingness) ? CP nsplit rel error 1 NaN ? ? ?0 ? ? ? NaN
set.seed(1) Annual.Forb.mrt<-mvpart(Annual.Forb~.,Qenv) printcp(Annual.Forb.mrt)
mvpart(form = Annual.Forb ~ ., data = Qenv) Variables actually used in tree construction: [1] Slope Root node error: 105.27/22 = 4.7851 n= 22 ? ? ? ?CP nsplit rel error xerror ? ? xstd 1 0.135579 ? ? ?0 ? 1.00000 1.1085 0.081214 2 0.096179 ? ? ?1 ? 0.86442 1.0827 0.079488
Annual.Forb.std<- scaler(Annual.Forb, col="mean1", row="mean1") Annual.Forb.std.mrt<-mvpart(Annual.Forb.std~.,Qenv) printcp(Annual.Forb.std.mrt)
mvpart(form = Annual.Forb.std ~ ., data = Qenv) Variables actually used in tree construction: [1] Elev Root node error: 4282.1/22 = 194.64 n= 22 ? ? ? CP nsplit rel error xerror ? ?xstd 1 0.15587 ? ? ?0 ? 1.00000 1.1015 0.12860 2 0.10174 ? ? ?1 ? 0.84413 1.0949 0.12898
printcp(Annual.Grass.std.mrt)
mvpart(form = Annual.Grass.std ~ ., data = Qenv) Variables actually used in tree construction: [1] heatld Root node error: 219.76/22 = 9.989 n= 22 ? ? ? CP nsplit rel error xerror ? ?xstd 1 0.12602 ? ? ?0 ? 1.00000 1.1179 0.43984 2 0.11866 ? ? ?1 ? 0.87398 1.4865 0.51020
While output for the standardized data for annual forb is the same as with raw data, this is often not the case in my larger dataset. data files are appended, and will be provided separately on request. Thanks very much for looking at this. Mike Marsh Washington Native Plant Society
Sarah Goslee http://www.functionaldiversity.org