Naive Bayes Classifier
The "naive Bayes" classifier I've seen discussed in various machine-learning papers and books is as described by David Meyer in his posting, except that class (mixture component) membership is known in the training data. So it's "supervised"--classes aren't "latent". The estimation is usually just via "plug-in": 1. Compute marginal frequencies within class. 2. multiply these together as if variables (say x) were independent within class to get an "estimate" of the class-conditional probabilities p(x | c) 3. via Bayes rule get the (x-) conditional probabilities over class (posterior class probabilities) p(c | x). (Actually you don't need to divide here since it's a common factor in the quantities to be compared to get the classifier...) 4. To classify x find the class c maximizing p(c | x) (or minimizing the sum of L(c,i)*p(i|x) over i if L(,) is a given loss function). Often step 1 is replaced by Bayesian estimates of the marginal probabilities to prevent 0 estimates and reduce variance. In case you don't find an R implementation I hope the above is helpful. A final remark: while the expression for the posterior probabilities is the same as for logistic regression (as Brian Ripley pointed out), the estimation is different--even in large samples--when the model is incorrect (as it is anticipated to be by the "naive" qualifier). Tom Mitchell's talk at the SIAM Data Mining conference had an example of this, citing large gains in performance by switching from the naive bayes approach to maximizing the logistic regression likelihood. Reid Huntsinger -----Original Message----- From: David Meyer [mailto:david.meyer at ci.tuwien.ac.at] Sent: Thursday, May 17, 2001 5:32 AM To: Murray Jorgensen Cc: Ursula Sondhauss; r-help at stat.math.ethz.ch Subject: Re: [R] Naive Bayes Classifier
Murray Jorgensen wrote:
As I understand Naive Bayes it is essentially a finite mixture model for multivariate categorical distributions where the variables are independent
in
each component of the mixture. That is, I believe it to be a synonym
Latent
Class analysis. I believe the Frayley/Raftery package mclust may include
this
sort of model, and possibly other packages. Certainly these models may be expressed in the language of graphical models. Whether or not this would
be
useful for estimation purposes I am uncertain.
You could also try lca() in package e1071. -d
Murray Jorgensen At 04:28 PM 16-05-01 +0100, Prof Brian Ripley wrote:
On Wed, 16 May 2001, Ursula Sondhauss wrote:
I am looking for an implementation of the Naive Bayes classifier for a multi-class classification problem. I can not even find the Naive Bayes classifier for two classes, though I can not believe it is not available. Can anyone help me?
Hard to believe but likely true. However, as I understand this, it
applies
to a (K+1)-way contingency table, with K explanatory factors and and one response. And the `naive Bayes' model is a particular model for that table. If you want a classifier, you only need the conditional distribution of the response given the explanatory factors, and that is a main-effects-only multiple logistic model. Now the *estimation* procedures may be slightly different (`naive Bayes' is not fully
defined),
but if that does not matter, use multinom() in package nnet to fit this. A book on Graphical Modelling (e.g. Whittaker or Edwards) may help elucidate the connections. Let me stress *as I understand this* here. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- .-.-
r-help mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ ._._
Murray Jorgensen, Department of Statistics, U of Waikato, Hamilton, NZ -----[+64-7-838-4773]---------------------------[maj at waikato.ac.nz]----- "Doubt everything or believe everything:these are two equally convenient strategies. With either we dispense with the need to think." http://www.stats.waikato.ac.nz/Staff/maj.html - Henri Poincare'
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.-
r-help mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._
Mag. David Meyer Wiedner Hauptstrasse 8-10 Vienna University of Technology A-1040 Vienna/AUSTRIA Department for Statistics, Probability Tel.: (+431) 58801/10772 Theory and Actuarial Mathematics mail: david.meyer at ci.tuwien.ac.at -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._