Skip to content

Latest commit

 

History

History
17 lines (10 loc) · 1.21 KB

README.md

File metadata and controls

17 lines (10 loc) · 1.21 KB

Prioritized DDQN with ReLAx

Example Prioritized DDQN implementation with ReLAx

This repository contains an example implementation of Prioritized Experience Replay with DDQN algorithm

The performance versus vanilla DDQN is measured by averaging learning curves (for separate evaluation environment) over 4 experiments with random environment seeds.

The results are summarized in the following plot (DDQN is run only for 1.5m envsteps to save time):

per_benchmark

The differences in hyper-parameters settings between PER-DDQN and vanilla DDQN are the presence of prioritized experience replay and four times decreased learning rate for PER comparing to uniform sampling case. We can see that on that task PER-DDQN performs worse than a vanilla uniform version. However, that also may be the case only for this training horizon (1.5m steps instead of 200m)

Resulting Policy

per_ddqn.mp4