# Predicting a Startups Profit/Success Rate using Multiple Linear Regression in Python in given 50 Startups Dataset

0 like 0 dislike
8.8k views

Here 50 startups dataset containing 5 columns  like “R&D Spend”, “Administration”, “Marketing Spend”, “State”, “Profit”.

In this dataset first 3 columns provides you spending on Research , Administration and Marketing respectively. State indicates startup based on that state. Profit indicates how much profits earned by a startup.

Clearly, we can understand that it is a multiple linear regression problem, as the independent variables are more than one.

Prepare a prediction model for profit of 50_Startups data in Python

0 like 0 dislike
by Goeduhub's Expert (2.2k points)

# Multiple Linear Regression for 50 Startup

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, 4]

#Convert the column into categorical columns

states=pd.get_dummies(X['State'],drop_first=True)

# Drop the state coulmn
X=X.drop('State',axis=1)

# concat the dummy variables
X=pd.concat([X,states],axis=1)

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Fitting Multiple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Predicting the Test set results
y_pred = regressor.predict(X_test)

from sklearn.metrics import r2_score
score=r2_score(y_test,y_pred)
0 like 0 dislike
by (592 points)
0 like 0 dislike
by (342 points)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
from sklearn.linear_model import LinearRegression
df.info()
df.describe()
from sklearn.preprocessing import LabelEncoder
lab_enc = LabelEncoder()
df.State = lab_enc.fit_transform(df.State)
df.State.unique()
x = df.drop(['Profit'], axis=1)
print(x.shape)
y = df['Profit']
print(y.shape)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 1/3, random_state = 45)
model = LinearRegression()
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
y_pred
from sklearn.metrics import r2_score
score = r2_score(y_test, y_pred )
score
0.9426922836763976
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred)
mse
newdf = pd.DataFrame(y_pred, y_test)
newdf
plt.scatter(y_test, y_pred, marker = '^')
plt.show()
0 like 0 dislike
by (278 points)

Learn & Improve In-Demand Data Skills Online in this Summer With  These High Quality Courses[Recommended by GOEDUHUB]:-

Best Data Science Online Courses[Lists] on:-

Claim your 10 Days FREE Trial for Pluralsight.