Skip to content

schinger/pong_actor-critic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

pong_actor-critic

Train an agent with (Stochastic) Policy Gradients (Actor-Critic) on Pong. Use OpenAI Gym.

(Run success in Aug 2023. gym 0.26.2, python 3.7.4, windows 11)

  • background intro. http://karpathy.github.io/2016/05/31/rl/
  • the original REINFORCE counterpart https://gist.github.com/karpathy/a4166c7fe253700972fcbc77e4ea32c5
  • within 200,000 round can beat the build-in AI player. when training finished, you can turn on resume and render to see the playing animation
  • one hundred lines (~160) python, only need numpy, manually implement the forward and backward pass
  • Actor and Critic both use simple feedforward neural network with one hidden layer
  • k-step TD error as the advantage, the policy gradient is modulated by the advantage (line 130)
  • update target Value Network by moving average (line 149)

About

Trains an agent with (stochastic) Policy Gradients(actor-critic) on Pong. Uses OpenAI Gym.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages