Some basic statistics formulas for machine learning (data science) - Introduction

0 like 0 dislike
432 views
In this article we will discuss about some statistics topics and concepts vital in data science. Statistics itself a large subject to study but as we know that machine learning and data science are the fields somehow depends on statistics. So, in this tutorial we will take a look to some basic concepts which determine the scene behind machine learning and data science.

Centre Tendencies - Mean, Median, Mode .

Dispersion- Range , Interquartile Range (IQR) , Standard deviation , Variance.

Correlation , Frequencies , Proportion , Hypothesis and in inferences and it will be helpful if you basic knowledge of Probability and Algebra (vector)

For Indian Students- INR 360/- || For International Students- \$9.99/-

S.No.

Course Name

Coupon

1.

Tensorflow 2 & Keras:Deep Learning & Artificial Intelligence || Labeled as Highest Rated Course by Udemy

Apply Coupon

2.

Complete Machine Learning & Data Science with Python| ML A-Z Apply Coupon

3.

Complete Python Programming from scratch | Python Projects Apply Coupon

0 like 0 dislike
by Goeduhub's Expert (3.1k points)
edited by

Statistics: Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data.

Nowadays statistics has vital role in social, economy and industrial sectors . Statistics data is used to analyze the interest of people their health , education and various factors.

In machine learning statistics used to clean and prepare data ready for modeling. And also ,to used , best analyzing and visualization methods  depends on data.

Type of statistics

Descriptive Statistics: Descriptive statistic can be defined as organizing and summarizing data using graphs and tables. For this , tools are used in descriptive statistics are Mean , Median , Mode, Variance , Standard division etc...

Inferential statistics : Making conclusion or prediction using sample data from data.

Let's try to understand it with an example:

Let's suppose the total population of a city ABC is 10,000 . Using descriptive statistics we can represent various factors (gender, age, rich and poor etc..) of this population , for example how many female and male are there and we also can represent these factors on graph.

In this whole scenario we can use whole population (10,000).

But now there is question: how many people of the city ABC like Apple ?

As we know that it is not possible to ask everyone (10,000) if they like Apple or not ? In this case we can take a sample from the population for example we can take a sample of 100 people and asked everyone.

And suppose 40 people out of 100 likes apple. From this we can conclude that 40% of total population of city ABC likes apple.

That is basically here we are making conclusion or inference using sample data ,this falls under inferential statistics. With inferential statistics, you take data from samples and make generalizations about a population.

As you can see that inferential statistics is not always accurate or depends on sample data. we use margin to defined it as 40%+- 3% like this and in second case we can take a large sample for more accurate result.

Data in statistics:

Data: In statistics data is collection of observation and events of experiments.

We have to three type of data in statistics that is     1. Qualitative          2.Quantitative  3.Ranked

Qualitative: Qualitative data is defined as the data that is characterized. Qualitative data in statistics is also known as categorical data.

Quantitative: Qualitative data is defined numeric data / quantity. Quantitative data in statistics is also known as numeric data.

Ranked: Ranked defined the position of a events or a values in experiments and data. (Ordinal Data).

Scale of measurement:   Nominal , Ordinal , Interval/Ratio.

Now you can distinguish data and can make analysis as you know the data.