Online Courses
Free Tutorials  Go to Your University  Placement Preparation 
Artificial Intelligence(AI) & Machine Learning(ML) Training in Jaipur
Online Training - Youtube Live Class Link
0 like 0 dislike
in Artificial Intelligence(AI) & Machine Learning by Goeduhub's Expert (3.1k points)
edited by

In this article we will discuss, what is reinforcement machine learning, it's working process and applications in practical life. We will also discuss Q-learning and Makrov learning which are types of learning of reinforcement.

Goeduhub's Online Courses @Udemy

For Indian Students- INR 570/- || For International Students- $12.99/-


Course Name

Apply Coupon


Tensorflow 2 & Keras:Deep Learning & Artificial Intelligence

Apply Coupon


Computer Vision with OpenCV | Deep Learning CNN Projects

Apply Coupon


Complete Machine Learning & Data Science with Python Apply Coupon


Natural Language Processing-NLP with Deep Learning in Python Apply Coupon


Computer Vision OpenCV Python | YOLO| Deep Learning in Colab Apply Coupon


Complete Python Programming from scratch with Projects Apply Coupon

1 Answer

0 like 0 dislike
by Goeduhub's Expert (3.1k points)
edited by
Best answer

We know that in machine learning we have three types of learning or ways in which a machine learns, are:

Supervised learning: Machine learns under supervision with labeled data. Example-Predicting values based on learning

Unsupervised learning: Machine learns without any supervision with unlabeled data (learn by recognizing patterns in data). Example- Clustering the same items in a group 

Reinforcement learning: 

Reinforcement learning is advanced machine learning, in which machines learn in a different way than supervised and unsupervised learning.

In reinforcement learning, there is an agent which continuously learns from its environment by interacting with it. Based on the action of the agent it gets rewarded positively or negatively, which improves the performance of the agent to understand the environment and problem.

For example- Self-drive car 

reinforcement learning

Environment: A space in which an agent operates and learns, generally random (stochastic).

Reward: A reward is feedback to the agent for its action.

Agent: An entity that explores the environment.

State: Current situation of the agent or situation returned by the environment. 

Action: Actions are the moves taken by the agent based on its learning from the environment.

reinforcement learning is based on the Hit and Trial method where the agent is not instructed about the environment and actions need to be taken by the agent. It learns through feedback from the environment that is the reward. 

For example in self-driving car agent receive a negative reward if the car gets an accident (that get hit) and receive a positive reward if clear the goals without hitting.

To build an optimal policy for the self-driving car not to get hit, the agent has to explore more and more states and have to maximize its rewards. This is called the exploration vs exploitation trade-off. An agent has to balance both, to get a reward (value). 

Policy: Policy is a strategy mapped by the agent for the next action based on the current state.

Value: It is a long-term future reward that an agent should receive with the discount factor and opposite to the short-term reward.

Applications of reinforcement learning: 

  • In robotics for industrial automation.
  • Game playing 
  • In business to make decisions 
  • Traffic signal control 
  • Robotics control

Approaches to implement reinforcement learning 

There are three ways to implement reinforcement learning are: 

1.Value Based:  The value based approach used, to maximize value function at a state under any policy and agent expect a long term return at current state and any policy. 

2.Policy Based:  In policy based approach agent try to come up with such a policy that it can gain maximum rewards in future without using value function.

Two types of policy 

  1. Deterministic: Action of policy is same for any state.
  2. Stochastic: Action of policy determined by probability.

3.Model Based: 

In this Reinforcement Learning method, you need to create a virtual model for each environment and the agent explores that environment to learn it.

Types of Reinforcement learning

Positive Reinforcement : It impacts positively on the behavior of the agent and increases the strength and the frequency of the behavior of agent.

Negative Reinforcement:  The negative reinforcement is opposite to the positive reinforcement and more effective than the positive reinforcement as it increases the tendency that the specific behavior will occur again by avoiding the negative condition.

Reinforcement learning algorithms

There are two important learning models in reinforcement learning.

Markov Decision Process

In markov decision process agent is constantly interacts with the environment and performs actions. For each action , the environment responds and generate a new reward and state as a feedback to agent. 

The environment is fully observable environment and formally described  as Markov decision processes (MDPs).

Markov decision process in used to describe the environment for Reinforcement Learning , and almost all the RL problem can be formalized using MDP.


A markov decision process need to satisfy the Markov Property.

What  is Markov Property ?

It says that the future is independent of the past given the present. Meaning if agent is at current state S1 and performs an action A1 and move to the state S2, then the state transition from S1 to S2 only depends on the current state and future action and states do not depend on past actions, rewards, or states.

For example in chess game; player only focus on current state and  future action not on   past action and state.

Markov Process/ Markov chain: Markov Process is a memoryless process which consists sequence of random states S1, S2,S3 … with the Markov property.

Markov Process/ Markov Chain tuple (S,P) where  S : Finite set of states   and 

                                                                                 P: State transition probability 

Markov Reward Process: A Markov Reward Process is a Markov chain with reward values.

Markov reward process tuple (S,P,R,γ) where        S : Finite set of states   and 

                                                                                 P: State transition probability 

                                                                                 R: Reward

                                                                                 γ: Discount Factor 

In conclusion Markov Decision Process provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker.

Source: Wikipedia  , UCL Lecture 

Q-learning Algorithm in Reinforcement Learning 

Our Mentors(For AI-ML)

Sharda Godara Chaudhary

Mrs. Sharda Godara Chaudhary

An alumna of MNIT-Jaipur and ACCENTURE, Pune


Ms. Nisha

An alumna of IIT-BHU

Related questions


About Us | Contact Us || Terms & Conditions | Privacy Policy || Youtube Channel || Telegram Channel © Social::   |  |