Naive Bayes Classifier
- It is supervised learning algorithm used for classification based on Bayes' Theorem
- NBC is not just an algorithm, but a collection of many algorithms that work on the same concept, the Bayes' Theorem
Industrial Use of Naive Bayes Classifier
- News Categorization
- Spam filtering
- Object and face recognition.
- Medical Diagnosis
- Weather Prediction etc..
Type of Naive Bayes Classifier
We have three type of naive bayes classifier
- Gaussian
- Multinomial
- Bernoulli
Bayes' Theorem
NBS works only on the bass theorem. Let's see what the bass theorem is.
P(H/E) = P(E/H) P(H)/P(E)
- H- Hypothesis , E-Event / Evidence
- Bayes' Theorem works on conditional probability
- We have been given that if the event has happened or the event is true, then we have to calculate the probability of Hypothesis on this event.
- Means the chances of happening H when the event E is happened.
P(H) - It is said priori (A prior probability), Probability of H before E is happen.
P(H/E) - Posterior probability, Probability of E after event E is true.
Note: As our question is, we have implement a naive bayes classifier on .csv file,Here we will use the naive bayes classifier on wine data-set.
Wine Dataset Description
- The wine dataset contains the results of a chemical analysis of wines grown in a specific area of Italy.
- It contains total 178 samples (data), with 13 chemical analysis (features) recorded for each sample.
- And contains three classes (our target), with no missing values.
Implementation of Algorithm
#Import important libraries import numpy as np import pandas as pd #Import dataset from sklearn import datasets #Load dataset wine = datasets.load_wine() #print(wine)#if you want to see the data you can print data |
---|
Note:Here we have just loaded the data, you can download and load the data, you can also load it direct from sklearn .Our data dictionary is in the form of dictionary you can print and see it.
#print the names of the 13 features print ("Features: ", wine.feature_names) #print the label type of wine print ("Labels: ", wine.target_names) |
---|
Output
Note: Here we have seen our target and our features name by printing it, with this data we will train our data.
X=pd.DataFrame(wine['data']) print(X.head()) print(wine.data.shape) #print the wine labels (0:Class_0, 1:class_2, 2:class_2) y=print (wine.target) |
---|
Output
Note:
- Here we have seen the values of 14 samples for our 13 features by printing only five.
- On the basis of these features, our wine classes are made up of three, 0, 1, 2.
# Import train_test_split function from sklearn.model_selection import train_test_split # Split dataset into training set and test set X_train, X_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.30,random_state=109) |
---|
Note: Split our data into training data and testing data , 70 % training data and 30 % testing data. From training data our model learn and from testing data, we can see how much our model learned.
#Import Gaussian Naive Bayes model from sklearn.naive_bayes import GaussianNB #Create a Gaussian Classifier gnb = GaussianNB() #Train the model using the training sets gnb.fit(X_train, y_train) #Predict the response for test dataset y_pred = gnb.predict(X_test) print(y_pred) |
---|
Output
Note: We have used the Gussian model here,and then tested with test data
#Import scikit-learn metrics module for accuracy calculation from sklearn import metrics # Model Accuracy print("Accuracy:",metrics.accuracy_score(y_test, y_pred)) #confusion matrix from sklearn.metrics import confusion_matrix cm=np.array(confusion_matrix(y_test,y_pred)) cm |
---|
Output
Note:
- To check how good our model is, we have obtained the accuracy of our model.
- Here we calculated both confusion matrix and accuracy.
- We can see from the confusion matrix that our model has predict a total of 5 values wrong and are correct prediction.
Click here for more programs of RTU ML LAB
Artificial Intelligence(AI) Training in Jaipur
Machine Learning(ML) Training in Jaipur