MBPO with ReLAx

Example MBPO-SAC implementation with ReLAx

This repository contains an implementation of MBPO algorithm for SAC actor with ReLAx package.

The performance versus vanilla SAC is measured by averaging learning curves (for separate evaluation environment) over 4 experiments with random environment seeds.

The results are summarized in the following plot (MBPO is run only for 175k envsteps to save training time):

The only difference in hyper-parameters settings between MBPO-SAC and vanilla SAC is the presence of model based acceleration. We can see a substantial advantage of MBPO in terms of training speed by looking at the averaged curves.

Resulting Policy

mbpo_sac.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MBPO with ReLAx

Files

README.md

Latest commit

History

README.md

File metadata and controls

MBPO with ReLAx