## Naive Bayes Classification

In this article we will go one step further and discuss regarding Naïve Bayes Classification. This is the second article in our machine learning for beginners’ series. In the last article, we discussed regarding the general idea of Machine Learning.

Naïve Bayes Classification Algorithm is a supervised machine learning technique that is used to solve classification problems. Bayes Theorem (posterior probability) forms the core part of the Naïve Bayes algorithm. Let us first discuss the working of the Naïve Bayes classification algorithm and then we will look at an example.

As I mentioned earlier, Naïve Bayes Algorithm is based on Bayes Theorem, so let us first understand Bayes Theorem.

Bayes Theorem tell us how often A happens given that B happens, when we know how often B happens given that A happens, and how likely A and B are on their own. Here in the Bayes Theorem,

P(A|B) = Probability of A given B has occurred

P(B|A) = Probability of B given A has occurred

P(A) = Probability of A

P(B) = Probability of B

Imagine you have two friends, say Prakash and Diwakar. Since you are their childhood friend, you know their word usage pattern. Let us say that Prakash uses three words [light, simple, and good] more often whereas Diwakar uses words [long, slow, and good] more often.

Let us say that you receive an SMS from some unknown number and it says “I am lost in the light but it feels good to be away from the world for some time”.

Can you guess who has sent this SMS to you?

If you guessed Prakash you are correct and you may reason that the content has words light and good that are used by Prakash more often.

Now let us add some more information to our example. Suppose Prakash and Diwakar uses following words with probabilities as shown below. Now can you guess who is the might be the sender of the SMS “Wasup Dude?”

Now what do you think?

You are right, if you guessed it to be Diwakar. Let me explain the mathematics behind it, if you didn’t get it. We will apply Bayes Theorem, as explained above, to understand the underlying working in this example.

P(Prakash) = 0.50 (Assuming, for simplicity)

P(Diwakar) = 0.50 (Assuming, for simplicity)

P({Wasup,Dude}/Prakash) = P(Wasup/Prakash)*P(Dude/Prakash)

= 0.25*0.15 = 0.0375

P({Wasup,Dude}/Diwakar) = P(Wasup/Diwakar)*P(Dude/Diwakar)

=0.50*0.20 = 0.1

P({Wasup,Dude}) = 0.0375 * P(Prakash) + 0.1*P(Diwakar)

= 0.0375*0.50 + 0.1*0.50 = 0.06875

P(Prakash/{Wasup,Dude}) = (0.0375 )*(0.50)/(0.06875) = 0.27

P(Diwakar/{Wasup,Dude}) = (0.10)*(0.50)/(0.06875)= 0.73

This is the mathematical explanation for the above example. We can see in the above example, the probability of Diwakar sending the SMS is 0.73 and probability of Prakash sending the SMS is 0.27.

Naïve Bayes classifier calculates the probabilities for every factor (in our example it would be Prakash and Diwakar for given input features) and then it will select the outcome with highest probability. This algorithm is called Naïve, because the classifier assumes that the features (in this case the words) are independent. But this is still a powerful algorithm and is used for:

• Real time Prediction
• Text classification/spam filtering
• Recommendation System