SimplyPPO: A Minimal Proximal-Policy-Optimization PyTorch Implementation

SimplyPPO replicates PPO with minimum (~250) lines of code in clean, readable PyTorch style, while trying to use as few additional tricks and hyper-parameters as possible.

Implementation details:

Advantage and state normalization.
Gradient clipping.
Entropy bonus.
Tanh squashing to ensure action bounds and log_std clamping (as in SAC).

That's it! All other things follow the original paper.

Also check out SimplySAC, a minimal Soft-Actor-Critic PyTorch implementation.

Note

This is a single-threaded PPO implementation for continuous control tasks. The particular implementations of state normalization are adopted from here, where various other tricks are also discussed.

PyBullet benchmarks:

You can find the performance of Stable Baselines3 here as a reference.

These figures are produced with:

One evaluation episode every 1e4 steps.
5 random seeds, where the mean return is represented by the solid line, and max/min return by the shaded area.

To execute a single run:

python learn.py -g [gpu_id] -e [env_id] -l [log_id]

Experiments use pybullet==3.0.8.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
figures		figures
saves		saves
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
learn.py		learn.py
plot.py		plot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SimplyPPO: A Minimal Proximal-Policy-Optimization PyTorch Implementation

Implementation details:

PyBullet benchmarks:

About

Releases 2

Languages

License

arthur-x/SimplyPPO

Folders and files

Latest commit

History

Repository files navigation

SimplyPPO: A Minimal Proximal-Policy-Optimization PyTorch Implementation

Implementation details:

PyBullet benchmarks:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Languages