Gadgets 4 Students Career Guide Free Tutorials  Go to Your University  Placement Preparation 
0 like 0 dislike
8.8k views
in Python Programming by Goeduhub's Expert (2.2k points)

Task- Predicting a Startups Profit/Success Rate using Multiple Linear Regression in Python-Download Data Set click here.

Here 50 startups dataset containing 5 columns  like “R&D Spend”, “Administration”, “Marketing Spend”, “State”, “Profit”.

In this dataset first 3 columns provides you spending on Research , Administration and Marketing respectively. State indicates startup based on that state. Profit indicates how much profits earned by a startup.

Clearly, we can understand that it is a multiple linear regression problem, as the independent variables are more than one.

Prepare a prediction model for profit of 50_Startups data in Python

4 Answers

0 like 0 dislike
by Goeduhub's Expert (2.2k points)
 
Best answer
# Multiple Linear Regression for 50 Startup

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('50_Startups.csv')
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, 4]

#Convert the column into categorical columns

states=pd.get_dummies(X['State'],drop_first=True)

# Drop the state coulmn
X=X.drop('State',axis=1)

# concat the dummy variables
X=pd.concat([X,states],axis=1)

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Fitting Multiple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Predicting the Test set results
y_pred = regressor.predict(X_test)

from sklearn.metrics import r2_score
score=r2_score(y_test,y_pred)
0 like 0 dislike
by (592 points)
0 like 0 dislike
by (342 points)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
from sklearn.linear_model import LinearRegression
df = pd.read_csv('50_Startups.csv')
df.head()
df.info()
df.describe()
from sklearn.preprocessing import LabelEncoder
lab_enc = LabelEncoder()
df.State = lab_enc.fit_transform(df.State)
df.State.unique()
x = df.drop(['Profit'], axis=1)
print(x.head())
print(x.shape)
y = df['Profit']
print(y.head())
print(y.shape)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 1/3, random_state = 45)
model = LinearRegression()
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
y_pred
from sklearn.metrics import r2_score
score = r2_score(y_test, y_pred )
score
0.9426922836763976
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred)
mse
newdf = pd.DataFrame(y_pred, y_test)
newdf
plt.scatter(y_test, y_pred, marker = '^')
plt.show()
0 like 0 dislike
by (278 points)

Learn & Improve In-Demand Data Skills Online in this Summer With  These High Quality Courses[Recommended by GOEDUHUB]:-

Best Data Science Online Courses[Lists] on:-

Claim your 10 Days FREE Trial for Pluralsight.

Best Data Science Courses on Datacamp
Best Data Science Courses on Coursera
Best Data Science Courses on Udemy
Best Data Science Courses on Pluralsight
Best Data Science Courses & Microdegrees on Udacity
Best Artificial Intelligence[AI] Courses on Coursera
Best Machine Learning[ML] Courses on Coursera
Best Python Programming Courses on Coursera
Best Artificial Intelligence[AI] Courses on Udemy
Best Python Programming Courses on Udemy

Related questions

0 like 0 dislike
1 answer 2.0k views

 Important Lists:

Important Lists, Exams & Cutoffs Exams after Graduation PSUs

 Goeduhub:

About Us | Contact Us || Terms & Conditions | Privacy Policy || Youtube Channel || Telegram Channel © goeduhub.com Social::   |  | 

 

Free Online Directory
...