This repository is designed to provide an easy demo reinforcement learning framework for those studying deep reinforcement learning.
This framework is based on a tensorflow. And the basic model is implemented in example_model directory. If you want to use your own model, please refer provided model in example_model directory
We provide a tutorial to train the agent for the environment, and tutorials by action and input shape are provided as follows.
Environment
Continuous Action MLP - bipedalwalker, pendulum
Discrete Action MLP - LunarLander
Discrete Action CNN - Breakout
Algorithms
Continuous Action MLP - DDPG, TD3, PPO, PPO2
Discrete Action MLP - Vanilla PG, A2C, PPO, DQN, QRDQN, IQN
Discrete Action CNN - Vanilla PG, A2C, PPO, DQN, QRDQN, IQN
Our tutorial is being done in the gym environment provided by openai and you need to install the openai gym and box2d to run the tutorial code.
from git repository
https://github.com/RLOpensource/tensorflow_RL
pip install .
cpu version
pip install tensorflow-rl[tf-cpu]
gpu version
pip install tensorflow-rl[tf-gpu]
If you install this repository by only
pip install tensorflow-rl
tensorflow is not installed
tensorflow
box2d
gym
numpy
tensorboardX
- Vanilla Policy Gradient
- Advantage Actor Critic
- Proximal Policy Optimization
- Deep Deterministic Policy Gradient
- Value based Reinforcement Learning
- Soft Actor Critic
- LSTM train Algorithm
- Script : bipedalwalker_td3.py, bipedalwalker_ddpg.py, bipedalwalker_ppo.py, bipedalwalker_ppo2.py
- Environment : BipedalWalker-v2
- Orange : td3, Blue: ddpg, SkyBlue: ppo, Pink: ppo2
- Episode : 600
- Image : td3
- Script : pendulum_td3.py, pendulum_ddpg.py
- Environment : Pendulum-v0
- Orange : ddpg, Blue: td3
- Episode : 300
- Image : td3
- Script : breakout_rollout_a2c.py, breakout_rollout_ppo.py, breakout_rollout_vpg.py
- Environment : BreakoutDeterministic-v4 with Multi-processing
- Blue : ppo, Orange : a2c, Red : vpg
- Episode : 600
- Image : PPO
- Script : lunarLander_rollout_a2c.py, lunarLander_rollout_ppo.py, lunarLander_rollout_vpg.py
- Environment : LunarLander-v2 with Multi-processing
- Blue : ppo, Orange : a2c, Red : vpg
- Episode : 350
- Image : PPO
- Script : breakout_value_dqn.py, breakout_value_qrdqn.py, breakout_value_iqn.py
- Environment : BreakoutDeterministic-v4 with Multi-processing
- Green : IQN, Blue : QRDQN, Pink : DQN
- Episode : 280
- Image : IQN
- Script : lunarLander_value_dqn.py, lunarLander_value_qrdqn.py, lunarLander_value_iqn.py
- Environment : LunarLander-v2 with Multi-processing
- Orange : IQN, Blue : QRDQN, Red : DQN
- Episode : 250
- Image : IQN
- Script : breakout_rollout_ppo_1stack_lstm.py, breakout_rollout_ppo_1stack.py
- Environment : BreakoutDeterministic-v4 with Multi-processing
- Orange : PPOLSTM, Blue : PPO-1stack
- Episode : 1000
- Image : PPOLSTM
We do not have the copyright to this repository.
Please 'just' use these code and just 'refer' the url of repository in any form.
[1] mario_rl
[2] Proximal Policy Optimization
[3] Efficient Parallel Methods for Deep Reinforcement Learning
[4] High-Dimensional Continuous Control Using Generalized Advantage Estimation
[5] Asynchronous Methods for Deep Reinforcement Learning
[6] Continuous Control With Deep Reinforcement Learning
[8] Deep Recurrent Q-Learning for Partially Observable MDPs
[9] Playing Atari with Deep Reinforcement Learning
[10] Distributional Reinforcement Learning with Quantile Regression
[11] Implicit Quantile Networks for Distributional Reinforcement Learning
[12] OpenAI Spinningup
[13] Reinforcement Learning Korea PG Travel
[14] Medipixel Reinforcement Learning Repository
Please fork this repository and contribute to strengthen the tensorflow reinforcement learning ecosystem
Content us to [email protected]