Predicting a Startups Profit/Success Rate using Multiple Linear Regression in Python in given 50 Startups Dataset

Question

Predicting a Startups Profit/Success Rate using Multiple Linear Regression in Python in given 50 Startups Dataset

asked Jun 4, 2021 in Python Programming by Sharda Chaudhary Goeduhub's Expert (2.2k points)

Task- Predicting a Startups Profit/Success Rate using Multiple Linear Regression in Python-Download Data Set click here.

Here 50 startups dataset containing 5 columns like “R&D Spend”, “Administration”, “Marketing Spend”, “State”, “Profit”.

In this dataset first 3 columns provides you spending on Research , Administration and Marketing respectively. State indicates startup based on that state. Profit indicates how much profits earned by a startup.

Clearly, we can understand that it is a multiple linear regression problem, as the independent variables are more than one.

Prepare a prediction model for profit of 50_Startups data in Python

4 Answers

answered Jun 6, 2021 by Sharda Chaudhary Goeduhub's Expert (2.2k points)

Best answer

                        
                            # Multiple Linear Regression for 50 Startup
                          
                            # Importing the libraries
                          
                            import numpy as np
                          
                            import matplotlib.pyplot as plt
                          
                            import pandas as pd
                          
                            # Importing the dataset
                          
                            dataset = pd.read_csv('50_Startups.csv')
                          
                            X = dataset.iloc[:, :-1]
                          
                            y = dataset.iloc[:, 4]
                          
                            #Convert the column into categorical columns
                          
                            states=pd.get_dummies(X['State'],drop_first=True)
                          
                            # Drop the state coulmn
                          
                            X=X.drop('State',axis=1)
                          
                            # concat the dummy variables
                          
                            X=pd.concat([X,states],axis=1)
                          
                            # Splitting the dataset into the Training set and Test set
                          
                            from sklearn.model_selection import train_test_split
                          
                            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
                          
                            # Fitting Multiple Linear Regression to the Training set
                          
                            from sklearn.linear_model import LinearRegression
                          
                            regressor = LinearRegression()
                          
                            regressor.fit(X_train, y_train)
                          
                            # Predicting the Test set results
                          
                            y_pred = regressor.predict(X_test)
                          
                            from sklearn.metrics import r2_score
                          
                            score=r2_score(y_test,y_pred)

Learn & Improve In-Demand Data Skills Online in this Summer With These High Quality Courses[Recommended by GOEDUHUB]:-

Best Data Science Online Courses[Lists] on:-

Claim your 10 Days FREE Trial for Pluralsight.

Best Data Science Courses on Datacamp

Best Data Science Courses on Coursera

Best Data Science Courses on Udemy

Best Data Science Courses on Pluralsight

Best Data Science Courses & Microdegrees on Udacity

Best Artificial Intelligence[AI] Courses on Coursera

Best Machine Learning[ML] Courses on Coursera

Best Python Programming Courses on Coursera

Best Artificial Intelligence[AI] Courses on Udemy

Best Python Programming Courses on Udemy

G.Vigneshwaran · Answer 1 · 2021-06-05T05:17:21+0000

                        import numpy as np
                      
                        import pandas as pd
                      
                        import matplotlib.pyplot as plt
                      
                        import sklearn
                      
                        from sklearn.linear_model import LinearRegression
                      
                            df = pd.read_csv('50_Startups.csv')
                          
                            df.head()
                          
                              df.info()
                            
                              df.describe()
                            
                                  from sklearn.preprocessing import LabelEncoder
                                
                                  lab_enc = LabelEncoder()
                                
                                  df.State = lab_enc.fit_transform(df.State)
                                
                                  df.State.unique()
                                
                                      x = df.drop(['Profit'], axis=1)
                                    
                                      print(x.head())
                                    
                                      print(x.shape)
                                    
                                          y = df['Profit']
                                        
                                          print(y.head())
                                        
                                          print(y.shape)
                                        
                                              from sklearn.model_selection import train_test_split
                                            
                                              x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 1/3, random_state = 45)
                                            
                                                  model = LinearRegression()
                                                
                                                  model.fit(x_train, y_train)
                                                
                                                      y_pred = model.predict(x_test)
                                                    
                                                      y_pred
                                                    
                                                          from sklearn.metrics import r2_score
                                                        
                                                          score = r2_score(y_test, y_pred )
                                                        
                                                          score
                                                        
                                                          0.9426922836763976
                                                        
                                                              from sklearn.metrics import mean_squared_error
                                                            
                                                              mse = mean_squared_error(y_test, y_pred)
                                                            
                                                              mse
                                                            
                                                                  newdf = pd.DataFrame(y_pred, y_test)
                                                                
                                                                  newdf
                                                                
                                                                      plt.scatter(y_test, y_pred, marker = '^')
                                                                    
                                                                      plt.show()

Important Lists:	Important Lists, Exams & Cutoffs	Exams after Graduation	PSUs
Goeduhub:	About Us \| Contact Us \|\| Terms & Conditions \| Privacy Policy \|\| Youtube Channel \|\| Telegram Channel	© goeduhub.com	Social:: \| \|

Predicting a Startups Profit/Success Rate using Multiple Linear Regression in Python in given 50 Startups Dataset

Task- Predicting a Startups Profit/Success Rate using Multiple Linear Regression in Python-Download Data Set click here.

Please log in or register to answer this question.

4 Answers

Your comment on this answer:

Your comment on this answer:

Your comment on this answer:

Your comment on this answer:

Related questions