In statistics , correlation is a statistics relationship between two random variable. The word Correlation is made of Co- (together), and Relation.
For example; Your IQ and wealth in relation with your parents. Correlation among predictors and predictors and target variables.
Types of correlation:
1. Positive Relation: When high values of random variable X go with high values of random variable Y and low values of random variable X go with low values of random variable Y. Known as positive relation between variables.
For example; see the table below
|Distance (X) Km||Time (Y) minute |
In the table you can see that relatively high values of distance (X) go with relatively high values of time (Y) meaning it is a positive relation.
Negative Relation: If high values of X go with low values of Y, and vice versa, the variables are negatively correlated.
For example relation of speed and time ; As we know for fixed distance increasing in speed reduce time. Fixed distance -10 km
|Speed km/minute ||Time |
|5 ||2 |
In the table you can see that relatively low values of distance (X) go with relatively high values of time (Y) meaning it is a positive relation.
Little and No Relation: When there is regularity in relation in two variables or it provide little information about relation of variables. Known as little and no relation.
Scatterplot: A plot where x-axis represent one variable and y-axis represent another. And graph contains cluster of dots that represents relation of variable pairs.
See the diagram below to understand scatterplot and correlation
In the above diagram ;
In first case weight increased with Height which is true in practical life. In second case people who smokes more , life expectancy of these people is low compare to people who don't smokes or do less smoking which is also true. These two relation is kind of relation relation of variables. For this let's assume that a dot cluster approximates a straight line and, therefore, reflects a linear relationship. And if a dot cluster approximates a bent or curved line, and therefore reflects a curvilinear relationship.
In third case; it does not represent any relation between variables as we know the height of a person doesn't matter with his life expectancy.
Correlation Coefficient (r):
A correlation coefficient is a numerical value between -1 to +1 which describe the relationship of variables.
You don't have to worry about formula in programming we can directly calculate correlation coefficient using numpy. You just have to import correlation coefficient (whichever you want ).
There are many types of correlation coefficient , one of which is pearson correlation coefficient.
The sign of r indicates the type of linear relation. means positive (+) ; positive linear relation
negative (-) ; negative relation and neutral (0); no correlation.
And the value of r indicate how strong relation is (without regard to sign).
In mathematics and statistics covariance is measure of the relationship between two random variables. Meaning a covariance matrix measure how much and at what extent two variables change together.
In other words
We know that a variance measures how a single variable deviates from its mean, covariance measures how two variables vary in tandem from their means.
Note: Note that covariance is measured with units unlike correlation.
Positive Covariance: means two variables tends to move in same direction.
Negative Covariance: means two variables tends to move in inverse direction.
This is the formula of covariance, like correlation you don't have to worry about formula; with numpy we can directly calculate covariance. In the formula X and Z are variables.
You can ask a question here that correlation and covariance both determine the relationship and measures the dependency between two random variables. Then what is the difference between these two ?
Answer : Difference between correlation and covariance ?