Skip to content
Back to formatted view

Raw Message

Message-ID: <c822758d-3cf2-44b4-8e68-c76ce30ec4a4@26g2000hsk.googlegroups.com>
Date: 2008-09-27T17:51:26Z
From: milicic.marko
Subject: Logistic regression problem

I have a huge data set with thousands of variable and one binary
variable. I know that most of the variables are correlated and are not
good predictors... but...

It is very hard to start modeling with such a huge dataset. What would
be your suggestion. How to make a first cut... how to eliminate most
of the variables but not to ignore potential interactions... for
example, maybe variable A is not good predictor and variable B is not
good predictor either, but maybe A and B together are good
predictor...

Any suggestion is welcomed