Skip to content

Recursive partitioning with multicollinear variables

2 messages · Jean-Noel, Frank E Harrell Jr

#
Dear all,
I would like to perform a regression tree analysis on a dataset with
multicollinear variables (as climate variables often are). The questions
that I am asking are:
 1- Is there any particular statistical problem in using multicollinear
variables in a regression tree?
 2- Multicollinear variables should appear as alternate splits. Would it be
more accurate to present these alternate splits in the results of the
analysis or apply a variable selection or reduction procedure before the
regression tree?
Thank you in advance,

Jean-Noel Candau

INRA - Unit? de Recherches Foresti?res M?diterran?ennes
Avenue A. Vivaldi
84000 AVIGNON
Tel: (33) 4 90 13 59 22
Fax: (33) 4 90 13 59 59
#
On Mon, 9 Feb 2004 11:24:39 +0100
"Jean-Noel" <jean-noel.candau at avignon.inra.fr> wrote:

            
A more accurate and stable result would be obtained by performing a data
reduction procedure that ignores the response variable.  Combining
collinear variables into an index is often better than arbitrarily
choosing between them.  Then use the indexes in a regression model unless
you have tens of thousands of observations for recursive partitioning, or
are using bagging of trees or a related procedure to cancel out the
instability in the tree growing process [which unfortunately will often
result in an average of trees that is more complex in appearance than a
regression model].

Frank
---
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University