Advantage-Actor-Critic for OpenAI-gym environments

Implementation of Advantage-Actor-Critic with entropy regularization in Pytorch for OpenAI-gym environments.

Advantage-Actor-Critic

The policy gradient in Adavantage-Actor-Crititc differes from the classical REINFORCE policy gradient by using a baseline to reduce variance. This baseline is an approximation of the state value function (Critic). Since the baseline is not dependent on the action this does not introduce bias.
For more detailed information I would recommend reading this articel.

Entropy regularization

In order to encourage exploration we add the entropy of the policy distribution to the loss. This forces the actor to consider as much actions as possible while still maximizing the reward.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
A2C_gym.py		A2C_gym.py
A2C_memory.py		A2C_memory.py
A2C_models.py		A2C_models.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advantage-Actor-Critic for OpenAI-gym environments