Skip to content

Latest commit

 

History

History
59 lines (48 loc) · 3.79 KB

README.md

File metadata and controls

59 lines (48 loc) · 3.79 KB

Reinforcement Learning

여러 환경에 적용해보는 강화학습 예제(파이토치로 옮기고 있습니다)

Alt text

[Breakout / Use DQN(Nature2015)]

1. Q-Learning / SARSA

2. Q-Network (Action-Value Function Approximation)

3. DQN

DQN(NIPS2013)은 (Experience Replay Memory / CNN) 을 사용.

DQN(Nature2015)은 (Experience Replay Memory / Target Network / CNN) 을 사용

5. Vanilla Policy Gradient(REINFORCE)

6. Advantage Actor Critic

7. Deep Deterministic Policy Gradient

8. Parallel Advantage Actor Critic(is called 'A2C' in OpenAI)

9. C51(Distributional RL)

10. PPO(Proximal Policy Optimization)