Online Courses
Free Tutorials  Go to Your University  Placement Preparation 
Goeduhub's Online Courses @ Udemy in Just INR 570/-
Online Training - Youtube Live Class Link
0 like 0 dislike
1.3k views
in Python Programming by Goeduhub's Expert (2.2k points)

Practice KNN - We have a dataset that contains multiple user's information through the social network who are interested in buying SUV Car or not. 

DataSet-Click Here for Download user_data.csv 

Goeduhub's Top Online Courses @Udemy

For Indian Students- INR 360/- || For International Students- $9.99/-

S.No.

Course Name

 Coupon

1.

Tensorflow 2 & Keras:Deep Learning & Artificial Intelligence

Apply Coupon

2.

Natural Language Processing-NLP with Deep Learning in Python Apply Coupon

3.

Computer Vision OpenCV Python | YOLO| Deep Learning in Colab Apply Coupon
    More Courses

2 Answers

0 like 0 dislike
by (130 points)
import pandas as pd

import numpy as np

import sklearn

import seaborn as sns

import matplotlib.pyplot as plt
df=pd.read_csv('/content/User_Data.csv')

df.head()
df.shape
df.duplicated().sum()
df.isnull().sum()
df.dtypes

df.columns

df.corr()

labelEncoding
from sklearn.preprocessing import LabelEncoder

le=LabelEncoder()

df.Gender=le.fit_transform(df.Gender)

df.Gender.head()
x=df.drop(['Purchased','User ID',],axis='columns')

print(x)

y=df['Purchased']

print(y)
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

x.drop('Gender',axis=1,inplace=True)

x=scaler.fit_transform(x)

print(x)

Vizualization
sns.heatmap(df.corr(),annot=True)

plt.title('Correlation')

plt.show()
sns.heatmap(df.isnull(),yticklabels=False)
sns.set_style('whitegrid')

sns.countplot(x='Purchased',hue='Gender',data=df)
sns.pairplot(df,hue='Gender',vars=['Age','Purchased','EstimatedSalary'],palette='gist_rainbow')
Detecting Outliers
max_threshold=df['Age'].quantile(0.95)

print(max_threshold)

min_threshold=df['Age'].quantile(0.05)

print(min_threshold)

df[df['Age']>max_threshold]
df[df['Age']<min_threshold]

Removing Outliers
df[(df['Age']<max_threshold)&(df['Age']>min_threshold)]
Splitting Data into Train and Test

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=10)
KNN Classifier Model
from sklearn.neighbors import KNeighborsClassifier KNN=KNeighborsClassifier(n_neighbors=3) KNN.fit(x_train,y_train) Prediction
y_pred=KNN.predict(x_test)

y_pred

Confusion Matrix
cm=confusion_matrix(y_test,y_pred)

cm

Accuracy Score
acc=accuracy_score(y_test,y_pred)

print("ACCURACY IS",acc*100,'%')

Classification Report
print(classification_report(y_test,y_pred))
Box Plot
sns.boxplot(x='Purchased',y='Age',data=df)
plt.figure(figsize=(15,6))

sns.boxplot(x='Age',y='EstimatedSalary',data=df)

Vizualization -Accuracy Score
plt.figure(figsize=(5,5))

sns.heatmap(cm, annot=True, fmt=".2f", linewidths=.5, square = True, cmap = 'Blues_r')

plt.ylabel('Actual label')

plt.xlabel('Predicted label')

A=f'Accuracy Score :{acc:.2f}'

plt.title(A)


plt.show()

3.3k questions

7.1k answers

394 comments

4.6k users

 Goeduhub:

About Us | Contact Us || Terms & Conditions | Privacy Policy || Youtube Channel || Telegram Channel © goeduhub.com Social::   |  | 
...