Prioritized DDQN with ReLAx

Example Prioritized DDQN implementation with ReLAx

This repository contains an example implementation of Prioritized Experience Replay with DDQN algorithm

The performance versus vanilla DDQN is measured by averaging learning curves (for separate evaluation environment) over 4 experiments with random environment seeds.

The results are summarized in the following plot (DDQN is run only for 1.5m envsteps to save time):

The differences in hyper-parameters settings between PER-DDQN and vanilla DDQN are the presence of prioritized experience replay and four times decreased learning rate for PER comparing to uniform sampling case. We can see that on that task PER-DDQN performs worse than a vanilla uniform version. However, that also may be the case only for this training horizon (1.5m steps instead of 200m)

Resulting Policy

per_ddqn.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Prioritized DDQN with ReLAx

Files

README.md

Latest commit

History

README.md

File metadata and controls

Prioritized DDQN with ReLAx