What is Q-Learning: Everything you Need to Know | Simplilearn (2023)

What do you do when a dog or child misbehaves? You scold them to make sure that they do not repeat and learn bad behavior. On the other hand, you will reward them if they do something good, to instill good behavior. Believe it or not, this system of positive or negative reinforcement is also used to train machines. It is called reinforcement learning, and it helps us come up with unique solutions. Q learning is a type of reinforcement learning which is model-free!

In this article we will talk about what is Q-learning and how to go about implementing it.

Become an Expert in All Things AI and ML!

Caltech Post Graduate Program in AI & MLExplore Program

What is Q-Learning: Everything you Need to Know | Simplilearn (1)

What Is Reinforcement Learning?

In machine learning, a common drawback is the vast amount of data that models need to train. The more complex a model, the more data it may require. Even after all this, the data we get may not be reliable. It may have false or missing values or may be collected from untrustworthy sources.

Reinforcement Learning overcomes the problem of data acquisition by almost completely removing the need for data!

Reinforcement learning is a branch of Machine Learning that trains a model to come to an optimum solution for a problem by taking decisions by itself.

It consists of:

  • An Environment, which an agent will interact with, to learn to reach a goal or perform an action.
  • A Reward if the action performed by the model is bringing us closer to the goal/is leading to the goal. This is done to train the model in the right direction.
  • A negative reward if it performs an action that will not lead to the goal to prevent it from learning in the wrong direction.

Reinforcement learning requires a machine learning model to learn from the problem and come up with the most optimal solution by itself. This means that we also arrive at fast and unique solutions which the programmer might not even have thought of.

Consider the image below. You can see a dog in a room that has to perform an action, which is fetching. The dog is the agent; the room is the environment it has to work in, and the action to be performed is fetching.

What is Q-Learning: Everything you Need to Know | Simplilearn (2)

Figure 1: Agent, Action, and Environment

If the correct action is performed, we will reward the agent. If it performs the wrong action, we will not give it any reward or give it a negative reward, like a scolding.

What is Q-Learning: Everything you Need to Know | Simplilearn (3)

Figure 2: Agent performing an action

What Is Q-Learning?

Q-Learning is a Reinforcement learning policy that will find the next best action, given a current state. It chooses this action at random and aims to maximize the reward.

What is Q-Learning: Everything you Need to Know | Simplilearn (4)

Figure 3: Components of Q-Learning

Free Course: Machine Learning Algorithms

Learn the Basics of Machine Learning AlgorithmsEnroll Now

What is Q-Learning: Everything you Need to Know | Simplilearn (5)

Q-learning is a model-free, off-policy reinforcement learning that will find the best course of action, given the current state of the agent. Depending on where the agent is in the environment, it will decide the next action to be taken.

The objective of the model is to find the best course of action given its current state. To do this, it may come up with rules of its own or it may operate outside the policy given to it to follow. This means that there is no actual need for a policy, hence we call it off-policy.

Model-free means that the agent uses predictions of the environment’s expected response to move forward. It does not use the reward system to learn, but rather, trial and error.

An example of Q-learning is an Advertisement recommendation system. In a normal ad recommendation system, the ads you get are based on your previous purchases or websites you may have visited. If you’ve bought a TV, you will get recommended TVs of different brands.

What is Q-Learning: Everything you Need to Know | Simplilearn (6)

Figure 4: Ad Recommendation System

Using Q-learning, we can optimize the ad recommendation system to recommend products that are frequently bought together. The reward will be if the user clicks on the suggested product.

What is Q-Learning: Everything you Need to Know | Simplilearn (7)

Figure 5: Ad Recommendation System with Q-Learning

Important Terms in Q-Learning

  1. States: The State, S, represents the current position of an agent in an environment.
  2. Action: The Action, A, is the step taken by the agent when it is in a particular state.
  3. Rewards: For every action, the agent will get a positive or negative reward.
  4. Episodes: When an agent ends up in a terminating state and can’t take a new action.
  5. Q-Values: Used to determine how good an Action, A, taken at a particular state, S, is. Q (A, S).
  6. Temporal Difference: A formula used to find the Q-Value by using the value of current state and action and previous state and action.

What Is The Bellman Equation?

The Bellman Equation is used to determine the value of a particular state and deduce how good it is to be in/take that state. The optimal state will give us the highest optimal value.

The equation is given below. It uses the current state, and the reward associated with that state, along with the maximum expected reward and a discount rate, which determines its importance to the current state, to find the next state of our agent. The learning rate determines how fast or slow, the model will be learning.

What is Q-Learning: Everything you Need to Know | Simplilearn (8)

Figure 6: Bellman Equation

How to Make a Q-Table?

While running our algorithm, we will come across various solutions and the agent will take multiple paths. How do we find out the best among them? This is done by tabulating our findings in a table called a Q-Table.

A Q-Table helps us to find the best action for each state in the environment. We use the Bellman Equation at each state to get the expected future state and reward and save it in a table to compare with other states.

Lets us create a q-table for an agent that has to learn to run, fetch and sit on command. The steps taken to construct a q-table are :

Step 1: Create an initial Q-Table with all values initialized to 0

When we initially start, the values of all states and rewards will be 0. Consider the Q-Table shown below which shows a dog simulator learning to perform actions :

What is Q-Learning: Everything you Need to Know | Simplilearn (9)

Figure 7: Initial Q-Table

Step 2: Choose an action and perform it. Update values in the table

This is the starting point. We have performed no other action as of yet. Let us say that we want the agent to sit initially, which it does. The table will change to:

What is Q-Learning: Everything you Need to Know | Simplilearn (10)

Figure 8: Q-Table after performing an action

Step 3: Get the value of the reward and calculate the value Q-Value using Bellman Equation

For the action performed, we need to calculate the value of the actual reward and the Q( S, A ) value

What is Q-Learning: Everything you Need to Know | Simplilearn (11)

Figure 9: Updating Q-Table with Bellman Equation

Step 4: Continue the same until the table is filled or an episode ends

The agent continues taking actions and for each action, the reward and Q-value are calculated and it updates the table.

What is Q-Learning: Everything you Need to Know | Simplilearn (12)

Figure 10: Final Q-Table at end of an episode

Want To Become an AI Engineer? Look No Further!

Caltech Post Graduate Program in AI & MLExplore Program

What is Q-Learning: Everything you Need to Know | Simplilearn (13)

Q-Learning With Python

Let's use Q-Learning to find the shortest path between two points. We have a group of nodes and we want the model to automatically find the shortest way to travel from one node to another. We start by importing the necessary modules:

What is Q-Learning: Everything you Need to Know | Simplilearn (14)

Figure 11: Import necessary modules

Then we define all possible actions or the points/nodes that exist.

What is Q-Learning: Everything you Need to Know | Simplilearn (15)

Figure 12: Define the actions

We define the rewards array for every action.

What is Q-Learning: Everything you Need to Know | Simplilearn (16)

Figure 13: Define the rewards

We define our environment by mapping the state to a location and set the discount factor and learning rate:

What is Q-Learning: Everything you Need to Know | Simplilearn (17)

Figure 14: Create Environment and set variables

We then define our agent class and set its attributes.

What is Q-Learning: Everything you Need to Know | Simplilearn (18)

Figure 15:Define Agent

We then define its methods. The first method we refer to is training, which will train the robot in the environment.

What is Q-Learning: Everything you Need to Know | Simplilearn (19)

What is Q-Learning: Everything you Need to Know | Simplilearn (20)

Figure 16: Define a method for how the agent interacts with the environment

We then define a method to select the optimal route for the next state.

What is Q-Learning: Everything you Need to Know | Simplilearn (21)

Figure 17: Define a method to get optimal route

Now, let's call our agent and check the shortest route between points L9 and L1:

What is Q-Learning: Everything you Need to Know | Simplilearn (22)

Figure 16: Find the shortest route between two points

As we can see, the model has found the shortest path between points 1 and 9 by traversing through points 5 and 8.

Conclusion

In this article titled ‘What is Q-Learning? The best guide to Q-Learning’, we first looked at a sub-branch of machine learning called Reinforcement Learning. We then answered the question, ‘What is Q-Learning?’ which is a type of model-free reinforcement learning. The different terms associated with Q-Learning were introduced and we looked at the Bellman Equation, which is used to calculate the next state of our agent. We looked at the steps required to make a Q-Table and finally, we saw how to implement Q-Learning in Python with a demo.

We hope this article answered the question which was burning in the back of your mind: ‘What is Q-Learning?’.

Do you have any doubts or questions for us? Mention them in this article's comments section, and we'll have our experts answer them for you at the earliest!

Top Articles
Latest Posts
Article information

Author: Kerri Lueilwitz

Last Updated: 01/27/2023

Views: 5976

Rating: 4.7 / 5 (67 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Kerri Lueilwitz

Birthday: 1992-10-31

Address: Suite 878 3699 Chantelle Roads, Colebury, NC 68599

Phone: +6111989609516

Job: Chief Farming Manager

Hobby: Mycology, Stone skipping, Dowsing, Whittling, Taxidermy, Sand art, Roller skating

Introduction: My name is Kerri Lueilwitz, I am a courageous, gentle, quaint, thankful, outstanding, brave, vast person who loves writing and wants to share my knowledge and understanding with you.