Skip to content

predict.naiveBayes() bug in e1071 package

3 messages · Ali Tofigh, David Winsemius, David Meyer

#
Hi,

I'm currently using the R package e1071 to train naive bayes
classifiers and came across a bug: When the posterior probabilities of
all classes are small, the result from the predict.naiveBayes function
become NaNs. This is an issue with the treatment of the
log-transformed probabilities inside the predict.naiveBayes function.
Here is an example to demonstrate the problem (you might need to
increase 'nvar' depending on your machine):

-------------------- 8< --------------------
N <- 100
nvar <- 60
varnames <- paste("v", 1:nvar, sep="")

dat <- sapply(1:nvar, function(dummy) {c(rnorm(N/2, 0, 1), rnorm(N/2, 10, 1))})
colnames(dat) <- varnames

out <- rep(c("a","b"), each=N/2)
names(dat) <- varnames

nb <- naiveBayes(x=dat, y=out)

new.dat <- t(rnorm(nvar, 5, 0.1))
colnames(new.dat) <- varnames

predict(nb, new.dat, type="raw")
-------------------- 8< --------------------

the results of the last line is usually NaNs. As for the solution:

To protect agains very small numbers, the e1071:::predict.naiveBayes
function takes the probabilities into log-space and adds instead of
multiplying probabilities. However, when calculating the posterior
probabilities of each class (when type = "raw"), the log of the
probabilities are exponentiated, which defeats the purpose of the
logspace transformation. I suggest the following change to the code:

Towards the end of the predict.naiveBayes function, you currently do:

L <- exp(L)
L / sum(L)   # this is what is returned

you can instead use

sapply(L, function(lp) {1 / sum(exp(L - lp))})

the above comes from the following equality:

x / (x + y + z) = 1 / (1 + exp(log(y) - log(x)) + exp(log(z) - log(x)))

Best wishes,
/Ali Tofigh
#
On Feb 7, 2012, at 12:43 PM, Ali Tofigh wrote:

            
This should be sent to the maintainer of the package. The name of the  
maintainer can always be found in the DESCRIPTION file.  Several of  
the authors are regular readers of rhelp, but I do not know whether  
David Meyer is. I'm sure a well-documented bug report, as this appears  
to be, will be welcomed.
#
Confirmed & fixed upstream.

Thanks,
David
On 2012-02-07 18:43, Ali Tofigh wrote: