Cart-Pole

Cartpole.mp4

About the Project

A cart pole balancing agent powered by Q-Learning. Utilizes Python 3 and Gymnasium (formerly OpenAI Gym).

My first attempt at a solution leveraged a static learning rate which did not result in strong performance - 20,000 Episodes: 140 Mean Score. This can be seen in the file cartpole_static.py.

To increase performance, I transitioned to a variable learning rate which decays as the number of training episodes increases. This led to drastically improved results – 20,000 Episodes: 1,246 Mean Score (code, trained Q Table). Note: a mean score over 200 is considered a successful solution.

Please feel free to play with / adjust my Q Learning implementation. If you yield a better result, please let me know - I'd love to understand the changes you made and why. Thank you so much for reading!

A few notes on Q Learning and my implementation -

High Level Overview of Q Learning

Given a state in an environment, Q Learning explores the different actions it can take in this state and the observed rewards associated with these actions. Based on these observed rewards, the algorithm updates its knowledge of the perceived quality of these state-action pairs. After sufficient exploration, Q Learning produces a trained model to choose the 'optimal' action in any given state, leading to the greatest possible reward. Optimally is in quotes here as this heavily depends on the specific implementation of Q Learning.

Q Learning accomplishes this with no prior assumptions or knowledge of the environment. Hence, Q Learning is referred to as a model-free Reinforcement Learning algorithm.

In the Cart Pole scenario, a given state is represented by: the position of the cart, the cart's velocity, the angle of the pole, and the pole's velocity. The potential actions are simply: move the cart left or move the cart right.

For more information on Q Learning, please see: https://en.wikipedia.org/wiki/Q-learning

Epsilon Greedy

In Q Learning, we want to first explore the environment, and then apply our observed understanding of the environment to arrive at an optimal policy (that is, the action we should take in any given state to maximize our reward). To do this, we must balance our competing desires to explore and exploit the environment (i.e., maximize our rewards in the environment). Epsilon Greedy is a framework to strike such a balance and efficienctly converge to an optimal policy.

Binning

Q Learning represents a given environment as a finite number of state-action pairs, but the Cart Pole environment has a continuous state space (i.e., an infinite number of states). Thus, to apply Q Learning to the Cart Pole environment, we must first discretize or bin the Cart Pole state space. This is accomplished with the cart_pos_space, cart_velo_space, pole_ang_space and pole_velo_space parameters at the top of cartpole.py and the Bin function seen in helper.py.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
cartpole.py		cartpole.py
cartpole_static.py		cartpole_static.py
helper.py		helper.py
qtable.pickle		qtable.pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cart-Pole

About the Project

High Level Overview of Q Learning

Epsilon Greedy

Binning

About

Releases

Packages

Languages

wilsonreyhan/Cart-Pole

Folders and files

Latest commit

History

Repository files navigation

Cart-Pole

About the Project

High Level Overview of Q Learning

Epsilon Greedy

Binning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages