# What is Correlation and Covariance in Statistics (Machine Learning/ Data Science) ?

0 like 0 dislike
439 views

edited

In this article we will discuss what is correlation and covariance  in statistics  which is also important in machine learning/ data  science.  As we know in machine learning we often use scatter plots to represent  relation between variables. In this article we will understand how a scatter plot is formed and what is correlation coefficient and covariance.

### For Indian Students- INR 360/- || For International Students- \$9.99/-

S.No.

Course Name

Coupon

1.

Tensorflow 2 & Keras:Deep Learning & Artificial Intelligence

Apply Coupon

2.

Natural Language Processing-NLP with Deep Learning in Python Apply Coupon

3.

Computer Vision OpenCV Python | YOLO| Deep Learning in Colab Apply Coupon

0 like 0 dislike
by Goeduhub's Expert (3.1k points)
edited by

### Correlation:

In statistics , correlation is a statistics relationship between two random variable.  The word Correlation is made of Co- (together), and Relation.

For example;  Your IQ and wealth in relation with your parents.  Correlation among predictors and predictors and target variables.

Types of correlation:

1. Positive Relation:  When high values of random variable X go with high values  of random variable Y and low values of random variable X go with low values of random variable Y.  Known as positive relation between variables.

For example; see the table below

 Distance (X) Km Time (Y) minute 0.5 5 1 10 1.5 15 2.0 20

In the table you can see that relatively high values of distance (X) go with relatively high values of  time (Y) meaning it is a positive relation.

Negative Relation:  If high values of X go with low values of Y, and vice versa, the variables are negatively correlated.

For example relation of speed and time ; As we know for fixed distance increasing in speed reduce time.   Fixed distance -10 km

 Speed km/minute Time 5 2 2 5 1 10

In the table you can see that relatively low values of distance (X) go with relatively high values of  time (Y) meaning it is a positive relation.

Little and No Relation:  When there is regularity in relation in two variables or it provide little information about relation of variables. Known as little and no relation.

Scatterplot:  A plot where x-axis represent one variable and y-axis represent another. And graph contains cluster of dots that represents relation of variable pairs.

See the diagram  below to understand scatterplot and correlation In the above diagram ;

In first case weight increased with Height which is true in practical life.  In second case people who smokes more , life expectancy of these people is low compare to people who don't smokes or do less smoking which is also true. These two relation is kind of relation relation of variables. For this let's assume that a dot cluster approximates a straight line and, therefore, reflects a linear relationship. And if a dot cluster approximates a bent or curved line, and therefore reflects a curvilinear relationship.

In third case; it does not represent any relation between variables as we know  the height of a person doesn't matter with his life expectancy.

### Correlation Coefficient (r):

A correlation coefficient is a numerical value between -1 to +1 which describe the relationship of variables. You don't have to worry about formula in programming we can directly calculate correlation coefficient using numpy. You just have to import correlation coefficient (whichever you want ).

There are many types of correlation coefficient , one of which is pearson correlation coefficient.

The sign of r indicates the type of linear relation. means positive (+) ; positive linear relation

negative (-) ; negative relation     and  neutral (0); no correlation.

And the value of r indicate how strong relation is (without regard to sign).

### Covariance:

In mathematics and statistics covariance is measure of the relationship between two random variables. Meaning a covariance matrix measure how much and at what extent two variables change together.

In other words

We know that a variance measures how a single variable deviates from its mean, covariance measures how two variables vary in tandem from their means.

Note: Note that covariance is measured with units unlike correlation.

Positive Covariance:  means two variables tends to move in same direction.

Negative Covariance: means two variables tends to move in inverse direction. This is the formula of covariance, like correlation you don't have to worry about formula; with numpy we can directly calculate covariance.  In the formula X and Z are variables.

You can ask a question here that correlation and covariance both determine the relationship and measures the dependency between two random variables. Then what is the difference between these two ?

Answer : Difference between correlation and covariance ?