Online Courses
Free Tutorials  Go to Your University  Placement Preparation 
Goeduhub's Online Courses @ Udemy in Just INR 570/-
Online Training - Youtube Live Class Link
0 like 0 dislike
3.8k views
in Python Programming by Goeduhub's Expert (2.1k points)
edited by

Assignment/Task 5

Pandas - Data Analysis of IMDB movies data

As we have a basic understanding of the different data structures in Pandas, let’s explore the fun and interesting ‘IMDB-movies-dataset’ and get our hands dirty by performing practical data analysis on real data.

It is an open-source dataset and you can download it from this link.

We will read the data from the .csv file and perform the following basic operations on movies data

  1. Load the IMDb Dataset and read
  2. View the dataset
  3. Understand some basic information about the dataset and Inspect the dataframe Inspect the dataframe's columns, shapes, variable types etc.
  4. Data Selection – Indexing and Slicing data
  5. Data Selection – Based on Conditional filtering
  6. Groupby operations
  7. Sorting operation
  8. Dealing with missing values
  9. Dropping columns and null values
  10. Apply( ) functions

Goeduhub's Top Online Courses @Udemy

For Indian Students- INR 570/- || For International Students- $12.99/-

S.No.

Course Name

 Coupon

1.

Tensorflow 2 & Keras:Deep Learning & Artificial Intelligence

Apply Coupon

2.

Natural Language Processing-NLP with Deep Learning in Python Apply Coupon

3.

Computer Vision OpenCV Python | YOLO| Deep Learning in Colab Apply Coupon
    More Courses

3 Answers

0 like 0 dislike
by (344 points)
selected by
 
Best answer
  1. Load the IMDb Dataset and read

import numpy as np

import pandas as pd

df = pd.read_csv('IMDB-Movie-Data.csv')

2. View the dataset

df.head(10)

3. Understand some basic information about the dataset and Inspect the dataframe Inspect the dataframe's columns, shapes, variable types etc.

df.columns

type(df)

df.dtypes

df.shape

df.size

df.ndim

df.values

df1 = df.values

type(df1)

df.describe()

4. Data Selection – Indexing and Slicing data

df.iloc[0]

df[1:20]

df[['Rating','Votes']].agg(['min','max','mean'])

5. Data Selection – Based on Conditional filtering

df.filter(items=['Rank''Votes'])

df['Rating']>7

top_rank = df[df["Rating"] > 8.0]["Title"].count()

print(top_rank)

6. Groupby operations

df2 = df.groupby('Genre')

df2.mean()

df[df['Rating']>7].groupby('Genre')[['Rating']].count()

 top_movie = df[df["Rating"] > 8.0]

top_movie.groupby(["Title"])["Votes"].mean()

7. Sorting operation

x = df.sort_values(by='Rating')

x.head(10)

most_votes = df.groupby(["Votes"]).mean()
most_votes.sort_values(by = ["Votes"], ascending = False).head()

8. Dealing with missing values

df.isnull().sum()

9. Dropping columns and null values

df.dropna()

x = df.drop(['Metascore'], axis='columns', inplace=True)

10. Apply( ) functions

rank = df.apply(lambda n: n*5)

print(rank.head())

0 like 0 dislike
0 like 0 dislike
by (138 points)

GO_STP_6734

Hello Goeduhub 

here is the answer for task 5: https://www.linkedin.com/pulse/assignment-task-5-aswin-kumar

3.3k questions

7.1k answers

393 comments

4.5k users

Related questions

 Goeduhub:

About Us | Contact Us || Terms & Conditions | Privacy Policy || Youtube Channel || Telegram Channel © goeduhub.com Social::   |  | 
...