Skip to content

Latest commit

 

History

History
13 lines (7 loc) · 632 Bytes

README.md

File metadata and controls

13 lines (7 loc) · 632 Bytes

Example VPG implementation with ReLAx

This repository contains an implementation of vanilla policy gradient (VPG) with ReLAx.

VPG actor was trained on LunarLander-v2 Gym environment for 4m env-steps.

The graph of average return vs training step is shown below (batch_size=40000):

vpg_training

Resulting Policy:

vpg_run.mp4