This code implements a reinforcement learning algorithm for the CliffWalking-v0 environment from the OpenAI Gym library using Keras. The agent is a Deep Q-Network (DQN) with experience replay and an epsilon-greedy exploration strategy.
The Agent class is used to train the agent on the CliffWalking-v0 environment. The code sets the hyperparameters. Then, it runs a loop for a specified number of episodes and within each episode, it takes actions according to the agent's epsilon-greedy policy, updates the replay memory, and trains the neural network model using experience replay. At the end of each episode, the code saves the rewards obtained, the number of steps taken, and the accuracy of the agent in reaching the goal state.
The code also plots the rewards obtained, the epsilon value over time, the number of steps taken per episode, and the average accuracy of the agent in reaching the goal state per 10 steps. The plots are used to evaluate the performance of the agent and to fine-tune the hyperparameters of the algorithm.
The Cliff Walking environment is a standard reinforcement learning problem that involves navigating an agent through a grid-world environment with a cliff at the bottom. The environment consists of a grid of cells, with the agent starting in the top-left cell and a goal cell in the bottom-right cell. The agent can move in four directions - up, down, left, and right - and receives a negative reward for each step taken.
The catch in this environment is that if the agent steps off the grid, it falls off the cliff and receives a very large negative reward. Therefore, the agent must learn to navigate the environment without falling off the cliff in order to reach the goal cell and receive a positive reward.
The Cliff Walking environment is often used as a benchmark for testing and comparing reinforcement learning algorithms, as it is a simple and well-defined problem that requires the agent to learn a balance between exploration and exploitation to maximize its reward.
The above code implements the Q-learning algorithm on the CliffWalking environment of OpenAI Gym using a Deep Neural Network. The agent's goal is to navigate through the grid world from a starting position to the goal (position 47) while avoiding the cliff (positions 37 to 46).
The agent is trained for 600 episodes, and each episode can have a maximum of 300 steps. The agent receives a reward of -1 for each step taken, -100 for falling off the cliff, and +10 for reaching the goal.
The agent's performance is evaluated by running 100 test episodes, and the average reward and steps taken in these episodes are plotted.
The training and testing hyperparameters are set as batch size=32, learning rate=0.001, discount factor=0.9, and epsilon value (for the epsilon-greedy policy) is decayed from 1 to 0.01 over the training episodes.
The agent's Deep Neural Network model consists of three Dense layers with 32 nodes in each hidden layer and ReLU activation. The output layer has four nodes (equal to the number of possible actions in the CliffWalking environment) and linear activation.
The training results show that the agent improves its performance over time, as seen from the decreasing trend of average rewards and increasing trend of average accuracy. However, the agent still faces difficulty in reaching the goal quickly and avoiding the cliff. The agent's performance can be improved by tweaking the hyperparameters and using advanced algorithms like Double Q-learning or prioritized experience replay