Skip to content

tests for significance on conditional inference trees from party package

2 messages · Adrian Johnson, Achim Zeileis

#
Dear group,
Please allow me to ask a naive question and pardon if it is qualified
as stupid question.

I am using party package to classify covariates and predict
distribution of survival times for the classified variables.
Typically I have a matrix of covariates (columns) including outcome
data (overall survival in months, censor status) and other covariates
I want to split in tree (such as treatment dose etc. ) . Rows are
patients (~1000 patients).

Now similarly I have many such matrices (4K)  with completely
different set of covariates but identical outcome data and patients
(in rows). i cannot combine all data into a giant matrix,because these
covariates are totally independent.

Currently I am running this model in a loop and storing the tree and
parsing the tree structure.

My question is, is there some testing method to choose or rank these
4K trees such that I can select each tree from top to bottom. I know
each tree is important in its own way.    If selection based on
significance is required, then is there any other way instead of
conditional inference tree , that partitions data but will also carry
some significance to choose from.

Thanks!
.
#
Adrian,

thanks for your interest.
On Tue, 13 Dec 2016, Adrian Johnson wrote:

            
If the response variable is the same and the patients are the same, then I 
don't see why - conceptionally - you couldn't combine "totally 
independent" variables in the same tree. Or maybe I misunderstand what 
"totally independent" is.

Practically - however, choosing a tree from 4,000 regressor variables will 
be challenging, especially if you want to adjust in some way for the 
multiple testing. So maybe some additional structure would help here.
Parsing the tree structure is quite cumbersome in the old "party" 
implementation. This was one of the main motivations to establish the 
reimplementation in "partykit". This has a much better and more accessible 
tree infrastructure. See the vignettes in the "partykit" package for more 
details - especially vignette("partykit", package = "partykit") gives a 
good overview of the building blocks.

Additionally, over at StackOverflow you can find various additional 
bits and pieces that may be helpful. Look for the "party" tag.

Finally, there is also a partykit support forum on R-Forge.
It is not clear to me what/how you want to rank the results. However, 
looking at the sources of information listed above might take you a few 
steps further.
The MOB (model-based recursive partitioning) algorithm is also based on 
significance tests and implemented in the "partykit" package. It uses 
parametric asymptotic inference rather than nonparametric conditional 
inference. Otherwise the two approaches are very similar in many respects.

Hope that helps,
Z