Back to formatted view
Raw Message

Message-ID: <1344626209106-4639964.post@n4.nabble.com>
Date: 2012-08-10T19:16:49Z
From: Kirk Fleming
Subject: Analyzing Poor Performance Using naiveBayes()
In-Reply-To: <CAHfK2MtczrynjMGXd6VwRU41awdaeOZ=t+wrYXKG9s1hF1iV0A@mail.gmail.com>

Per your suggestion I ran chi.squared() against my training data and to my
delight, found just 50 parameters that were non-zero influencers. I built
the model through several iterations and found n = 12 to be the optimum for
the training data.

However, results still no so good for the test data. Here are he results for
both with the AUC values for n = 3 to 50, training data in the 0.97 range,
test data in the 0.55 area.

http://r.789695.n4.nabble.com/file/n4639964/Feature_Selection_02.jpg 

If the training and test data sets were not so indistinguishable, I'd assume
something weird about the test data--but I can't tell the two apart using
any descriptive, 'meta' statistics I've tried so far. Having double-checked
for dumb errors and having still obtained the same results, I toasted
everything and started from scratch--still the same performance on the test
data.

Maybe I take a break and reflect for 30 min.



--
View this message in context: http://r.789695.n4.nabble.com/Analyzing-Poor-Performance-Using-naiveBayes-tp4639825p4639964.html
Sent from the R help mailing list archive at Nabble.com.