Skip to content

Latest commit

 

History

History
17 lines (10 loc) · 1.03 KB

README.md

File metadata and controls

17 lines (10 loc) · 1.03 KB

MBPO with ReLAx

Example MBPO-SAC implementation with ReLAx

This repository contains an implementation of MBPO algorithm for SAC actor with ReLAx package.

The performance versus vanilla SAC is measured by averaging learning curves (for separate evaluation environment) over 4 experiments with random environment seeds.

The results are summarized in the following plot (MBPO is run only for 175k envsteps to save training time):

mbpo_training

The only difference in hyper-parameters settings between MBPO-SAC and vanilla SAC is the presence of model based acceleration. We can see a substantial advantage of MBPO in terms of training speed by looking at the averaged curves.

Resulting Policy

mbpo_sac.mp4