Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
MG2033 committed Jan 6, 2018
2 parents b91cc14 + 08bd418 commit 44767af
Showing 1 changed file with 17 additions and 8 deletions.
25 changes: 17 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# A2C
An implementation of `Synchronous Advantage Actor Critic (A2C)` in TensorFlow. A2C is a variant of advantage actor critic introduced by [OpenAI in their published baselines](https://github.com/openai/baselines). However, these baselines are difficult to understand and modify. So, I implemented the A2C based on their implementation but in a clearer and simpler way.
An implementation of `Synchronous Advantage Actor Critic (A2C)` in TensorFlow. A2C is a variant of advantage actor critic introduced by [OpenAI in their published baselines](https://github.com/openai/baselines). However, these baselines are difficult to understand and modify. So, I made the A2C based on their implementation but in a clearer and simpler way.

### What's new to OpenAI Baseline?
1. Support for Tensorboard visualization per running agent in an environment.
2. Support for different policy networks in an easier way.
3. Support for environments other than OpenAI gym in an easy way.
4. Support for video generation of an agent acting in the environment.
5. Simple and easy code to modify and begin experimenting. All you need to do is plug and play!

## Asynchronous vs Synchronous Advantage Actor Critic
Asynchronous advantage actor critic was introduced in [Asynchronous Methods for Deep Reinforcement Learning](https://arxiv.org/pdf/1602.01783.pdf). The difference between both methods is that in asynchronous AC, parallel agents update the global network each one on its own. So, at a certain time, the weights used by an agent maybe different than the weights used by another agent leading to the fact that each agent plays with a different policy to explore more and more of the environment. However, in synchronous AC, all of the updates by the parallel agents are collected to update the global network. To encourage exploration, stochastic noise is added to the probability distribution of the actions predicted by each agent.
Expand All @@ -11,16 +17,16 @@ Asynchronous advantage actor critic was introduced in [Asynchronous Methods for
### Environments Supported
This implementation allows for using different environments. It's not restricted to OpenAI gym environments. If you want to attach the project to another environment rather than that provided by gym, all you have to do is to inherit from the base class `BaseEnv` in `envs/base_env.py`, and implement all the methods in a plug and play fashion (See the gym environment example class). You also have to add the name of the new environment class in `A2C.py\env_name_parser()` method.

The methods that should be implemented in the new environment class are:
The methods that should be implemented in a new environment class are:
1. `make()` for creating the environment and returning a reference to it.
2. `step()` for taking a step in the environment and returning a tuple (observation images, reward float value, done boolean, any other info).
3. `reset()` for resetting the environment to the initial state.
4. `get_observation_space()` for returning an object with attribute tuple `shape` representing the shape of the observation space.
5. `get_action_space()` for returing an object with attribute tuple `n` representing the number of possible actions in the environment.
5. `get_action_space()` for returning an object with attribute `n` representing the number of possible actions in the environment.
6. `render()` for rendering the environment if appropriate.

### Policy Models Supported
This implementation comes with the basic CNN policy network from OpenAI baseline. However, it supports using different policy networks. All you have to do is to inherit from the base class `BasePolicy` in `models\base_policy.py`, and implement all the methods in a plug and play fashion again :D (See the CNNPolicy example class).
### Policy Networks Supported
This implementation comes with the basic CNN policy network from OpenAI baseline. However, it supports using different policy networks. All you have to do is to inherit from the base class `BasePolicy` in `models\base_policy.py`, and implement all the methods in a plug and play fashion again :D (See the CNNPolicy example class). You also have to add the name of the new policy network class in `models\model.py\policy_name_parser()` method.

### Tensorboard Visualization
This implementation allows for the beautiful Tensorboard visualization. It displays the time plots per running agent of the two most important signals in reinforcement learning: episode length and total reward in the episode. All you have to do is to launch Tensorboard from your experiment directory located in `experiments/`.
Expand All @@ -31,10 +37,10 @@ tensorboard --logdir=experiments/my_experiment/summaries
<img src="https://github.com/MG2033/A2C/blob/master/figures/plot.png"><br><br>
</div>

### Video Producing
During training, you can generate videos of the trained agent playing the game. This is achieved by changing `record_video_every` in the configuration file from -1 to the number of episodes between two generated videos. Generated videos are in your experiment directory.
### Video Generation
During training, you can generate videos of the trained agent acting (playing) in the environment. This is achieved by changing `record_video_every` in the configuration file from -1 to the number of episodes between two generated videos. Videos are generated in your experiment directory.

During testing, videos are generated automatically if the optional `monitor` method is implemented in the environment.
During testing, videos are generated automatically if the optional `monitor` method is implemented in the environment. As for the gym included environment, it's already been implemented.

## Usage
### Main Dependencies
Expand Down Expand Up @@ -67,3 +73,6 @@ In the project, two configuration files are provided as examples for training on

## License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

## Reference Repository
[OpenAI Baselines](https://github.com/openai/baselines)

0 comments on commit 44767af

Please sign in to comment.