Naive Bayes Classifier - R-help

Thu, May 17, 2001 11:15 AM #

The "naive Bayes" classifier I've seen discussed in various machine-learning
papers and books is as described by David Meyer in his posting, except that
class (mixture component) membership is known in the training data. So it's
"supervised"--classes aren't "latent". The estimation is usually just via
"plug-in":

1. Compute marginal frequencies within class.

2. multiply these together as if variables (say x) were independent within
class to get an "estimate" of the class-conditional probabilities p(x | c)

3. via Bayes rule get the (x-) conditional probabilities over class
(posterior class probabilities) p(c | x). (Actually you don't need to divide
here since it's a common factor in the quantities to be compared to get the
classifier...)

4. To classify x find the class c maximizing p(c | x) (or minimizing the sum
of L(c,i)*p(i|x) over i if L(,) is a given loss function).

Often step 1 is replaced by Bayesian estimates of the marginal probabilities
to prevent 0 estimates and reduce variance. 

In case you don't find an R implementation I hope the above is helpful.

A final remark: while the expression for the posterior probabilities is the
same as for logistic regression (as Brian Ripley pointed out), the
estimation is different--even in large samples--when the model is incorrect
(as it is anticipated to be by the "naive" qualifier). Tom Mitchell's talk
at the SIAM Data Mining conference had an example of this, citing large
gains in performance by switching from the naive bayes approach to
maximizing the logistic regression likelihood.

Reid Huntsinger

-----Original Message-----
From: David Meyer [mailto:david.meyer at ci.tuwien.ac.at]
Sent: Thursday, May 17, 2001 5:32 AM
To: Murray Jorgensen
Cc: Ursula Sondhauss; r-help at stat.math.ethz.ch
Subject: Re: [R] Naive Bayes Classifier

Murray Jorgensen wrote:

in

Latent

this

be

You could also try lca() in package e1071.

-d

applies

defined),

http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-

http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html

_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

Mag. David Meyer			Wiedner Hauptstrasse 8-10
Vienna University of Technology		A-1040 Vienna/AUSTRIA
Department for Statistics, Probability	Tel.: (+431) 58801/10772
Theory and Actuarial Mathematics	mail: david.meyer at ci.tuwien.ac.at
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._