Prioritized DDQN with ReLAx

Example Prioritized DDQN implementation with ReLAx

This repository contains an example implementation of Prioritized Experience Replay with DDQN algorithm

The performance versus vanilla DDQN is measured by averaging learning curves (for separate evaluation environment) over 4 experiments with random environment seeds.

The results are summarized in the following plot (DDQN is run only for 1.5m envsteps to save time):

The differences in hyper-parameters settings between PER-DDQN and vanilla DDQN are the presence of prioritized experience replay and four times decreased learning rate for PER comparing to uniform sampling case. We can see that on that task PER-DDQN performs worse than a vanilla uniform version. However, that also may be the case only for this training horizon (1.5m steps instead of 200m)

Resulting Policy

per_ddqn.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.ipynb_checkpoints		.ipynb_checkpoints
content/video		content/video
monitor_train_logs		monitor_train_logs
tensorboard_logs		tensorboard_logs
trained_models		trained_models
README.md		README.md
per_benchmark.png		per_benchmark.png
prioritized_ddqn.ipynb		prioritized_ddqn.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prioritized DDQN with ReLAx

About

Releases

Packages

Languages

nslyubaykin/prioritized_ddqn

Folders and files

Latest commit

History

Repository files navigation

Prioritized DDQN with ReLAx

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages