Message-ID: <1222810777.6136.16.camel@R1-Thux>
Date: 2008-09-30T21:39:37Z
From: Bernardo Rangel Tura
Subject: Logistic regression problem
In-Reply-To: <c822758d-3cf2-44b4-8e68-c76ce30ec4a4@26g2000hsk.googlegroups.com>
Em S?b, 2008-09-27 ?s 10:51 -0700, milicic.marko escreveu:
> I have a huge data set with thousands of variable and one binary
> variable. I know that most of the variables are correlated and are not
> good predictors... but...
>
> It is very hard to start modeling with such a huge dataset. What would
> be your suggestion. How to make a first cut... how to eliminate most
> of the variables but not to ignore potential interactions... for
> example, maybe variable A is not good predictor and variable B is not
> good predictor either, but maybe A and B together are good
> predictor...
>
> Any suggestion is welcomed
milicic.marko
I think do you start with a rpart("binary variable"~.)
This show you a set of variables to start a model and the start set to
curoff for continous variables
--
Bernardo Rangel Tura, M.D,MPH,Ph.D
National Institute of Cardiology
Brazil