no splits possible - in mvpart

Sat, Jan 29, 2011 8:20 AM

Hi Mike,

You need to carefully read the help for mvpart, rpart, and rpart.control -
this is a complex procedure and there are a lot of possible options
and ways to screw up.

cp is the complexity parameter - a proposed split must be as good or
better than cp to even be considered. If you aren't getting any splits,
then none of the splits possible in your data are good enough at that
level.

More formally, from the help for rpart.control:
cp complexity parameter. Any split that does not decrease the overall
lack of ?t by
a factor of cp is not attempted. For instance, with anova splitting, this means
that the overall Rsquare must increase by cp at each step. The main
role of this
parameter is to save computing time by pruning off splits that are
obviously not
worthwhile. Essentially,the user informs the program that any split which does
not improve the ?t by cp will likely be pruned off by
cross-validation, and that
hence the program need not pursue it.

I believe that classification trees (not clustering) as implemented in R are
covered in some  detail in MASS; you should probably also find and read that.

Sarah

On Fri, Jan 28, 2011 at 8:31 PM, Mike Marsh <swamp at blarg.net> wrote:

I am clustering vegetation richness (0 or 1) data that is segregated by
growth form, i.e. Shrub, Annual Grass, Perennial Grass, etc., using mvpart
for comparison with clustering by hclust.
The environmental file has four variables, Slope, Elevation, heatload, and
Ecological Site (a measure of soil and land form type).
When four of the six data files are analyzed, a split is successful when raw
data are analyzed, but a message,

"No splits possible -- try decreasing cp"

appears when data standardized by "scaler" are submitted.
My question: What does the message mean? How would I decrease cp.

I have re-read De'Ath, 2002 (Ecology 83:1105) regarding cross-validation,
and I assume that xerror in the table produced by printcp is that quantity.
In the present instance, there are only two leaves to the tree, and further
reduction of cp would seem impossible

A further puzzle is that when the smallest dataset (not included in this
analysis), with only 6 columns, is analyzed, a result is obtained for
standardized data. The Shrub data resented here as an example, have 27
columns, the Annual.Forb data, 35 columns.

Here is my script, with output:

set.seed(1)
Shrub.mrt<-mvpart(Shrub~.,Qenv)
printcp(Shrub.mrt)

mvpart(form = Shrub ~ ., data = Qenv)

Variables actually used in tree construction:
[1] Alt.E

Root node error: 69.727/22 = 3.1694

n= 22

? ? ? CP nsplit rel error xerror ? ?xstd
1 0.23477 ? ? ?0 ? 1.00000 1.1064 0.09480
2 0.12882 ? ? ?1 ? 0.76523 1.0372 0.10470

Shrub.std<- scaler(Shrub, col="mean1", row="mean1")
Shrub.std.mrt<-mvpart(Shrub.std~.,Qenv)

No splits possible -- try decreasing cp

printcp(Shrub.std.mrt)

rpart(formula = form, data = data)

Variables actually used in tree construction:
character(0)

Root node error: 0/0 = NaN

n=0 (22 observations deleted due to missingness)

? CP nsplit rel error
1 NaN ? ? ?0 ? ? ? NaN

set.seed(1)
Annual.Forb.mrt<-mvpart(Annual.Forb~.,Qenv)
printcp(Annual.Forb.mrt)

mvpart(form = Annual.Forb ~ ., data = Qenv)

Variables actually used in tree construction:
[1] Slope

Root node error: 105.27/22 = 4.7851

n= 22

? ? ? ?CP nsplit rel error xerror ? ? xstd
1 0.135579 ? ? ?0 ? 1.00000 1.1085 0.081214
2 0.096179 ? ? ?1 ? 0.86442 1.0827 0.079488

Annual.Forb.std<- scaler(Annual.Forb, col="mean1", row="mean1")
Annual.Forb.std.mrt<-mvpart(Annual.Forb.std~.,Qenv)
printcp(Annual.Forb.std.mrt)

mvpart(form = Annual.Forb.std ~ ., data = Qenv)

Variables actually used in tree construction:
[1] Elev

Root node error: 4282.1/22 = 194.64

n= 22

? ? ? CP nsplit rel error xerror ? ?xstd
1 0.15587 ? ? ?0 ? 1.00000 1.1015 0.12860
2 0.10174 ? ? ?1 ? 0.84413 1.0949 0.12898

printcp(Annual.Grass.std.mrt)

mvpart(form = Annual.Grass.std ~ ., data = Qenv)

Variables actually used in tree construction:
[1] heatld

Root node error: 219.76/22 = 9.989

n= 22

? ? ? CP nsplit rel error xerror ? ?xstd
1 0.12602 ? ? ?0 ? 1.00000 1.1179 0.43984
2 0.11866 ? ? ?1 ? 0.87398 1.4865 0.51020

While output for the standardized data for annual forb is the same as with
raw data, this is often not the case in my larger dataset.

data files are appended, and will be provided separately on request.

Thanks very much for looking at this.

Mike Marsh
Washington Native Plant Society

Sarah Goslee
http://www.functionaldiversity.org

no splits possible - in mvpart

Thread (2 messages)