# Logistic Regression Model on Why HR Leaving | Predicting employee attrition using Machine Learning

0 like 0 dislike
4.2k views

Predict retention of an employee within an organization such that whether the employee will leave the company or continue with it. An organization is only as good as its employees, and these people are the true source of its competitive advantage. Dataset is downloaded from Kaggle. Link: https://www.kaggle.com/giripujar/hr-analytics

First do data exploration and visualization, after this create a logistic regression model to predict Employee Attrition Using Machine Learning & Python.

0 like 0 dislike
by (110 points)

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from sklearn.linear_model import LogisticRegression

#Logistics Regression model

df1 = df[['salary','satisfaction_level'

'average_montly_hours'

'promotion_last_5years','left']]

dummies = pd.get_dummies(df1.salary)

df1 = pd.concat([df1,dummies],axis = 'columns')

df1 = df1.drop(['salary','medium'],axis='columns')

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df1[['satisfaction_level''average_montly_hours''promotion_last_5years','high','low']],df1.left, test_size =2/3,random_state = 1)

model = LogisticRegression()

model.fit(X_train,y_train)

model.score(X_test,y_test)

***************************** O U T P U T *****************************

0.7828

0 like 0 dislike
by (278 points)
0 like 0 dislike
by (132 points)
#GO_STP_379
# In this task we have to find the students scores based on their study hours.
# This is a simple Regression problem type because it has only two variables.
import pandas as pd
# exploration of data
print("-------exploration of data------------")
print(data.info())
# laber encoder of data
from sklearn.preprocessing import LabelEncoder
col=['Department','salary']
label_encoder =LabelEncoder()
data['Department']= label_encoder.fit_transform(data['Department'])
data['salary']= label_encoder.fit_transform(data['salary'])
print("after the laber encoder : \n",data)
# LogisticRegression of data
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix,accuracy_score
ft=data[['Department','satisfaction_level','salary']]
label=data['left']
xtrain,xtest,ytrain,ytest=train_test_split(ft,label)
my_model=LogisticRegression()
my_model.fit(xtrain,ytrain)
y_pred=my_model.predict(xtest# y test
cm=confusion_matrix(ytest,y_pred)
print("confusion matrix: ",cm)
print("accuracy socre: ",accuracy_score(ytest,y_pred))
print("socre: ",my_model.score(xtrain,ytrain))

# visualization of data
import matplotlib.pyplot as plt
plt.subplot(2,2,1)
plt.scatter(ytesty_predmarker = '+')
plt.xlabel('xtest')
plt.ylabel('y prediction')
plt.legend()
plt.title('Prediction of company')
plt.subplot(2,2,2)
plt.scatter(x=data['salary'], y=data['left'],label='salary and left')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.title('salary and left')
plt.subplot(2,2,3)
plt.scatter(x=data['satisfaction_level'], y=data['left'],label='satisfaction level and left')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.title('satisfaction level and left')
plt.subplot(2,2,4)
plt.scatter(x=data['time_spend_company'], y=data['left'],label='time_spend_company and left')
plt.xlabel('x')
plt.ylabel('y')
plt.title('time_spend_company and left')
plt.legend()
plt.show()

# logistic regression model to predict Employee Attrition
#create a pipeline for Logistic Regression
from sklearn.externals import joblib
import joblib as joblib
import pickle
with open('model_save','wb'as file:
pickle.dump(my_model,file)
with open('model_save','rb'as file:
# newmodel.coef_
joblib.dump(my_model,'model_joblib')
print("my model: ",mymodel)
print("new model: ",newmodel)
print("file is :",file)
0 like 0 dislike
by (120 points)

### Best Online Learning Opportunities

UDEMY::  Attend All Udemy Courses in Just INR 450[Coupon]