RL-Zoo3 v1.8.0 : New Documentation, OpenRL Benchmark, Multi-Env HerReplayBuffer
Release 1.8.0 (2023-04-07)
We have run a massive and open source benchmark of all algorithms on all environments from the RL Zoo: Open RL Benchmark
New documentation: https://rl-baselines3-zoo.readthedocs.io/en/master/
Warning
Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend.
Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs).
You can find a migration guide here.
If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.
Breaking Changes
- Upgraded to SB3 >= 1.8.0
- Upgraded to new
HerReplayBuffer
implementation that supports multiple envs - Removed
TimeFeatureWrapper
for Panda and Fetch envs, as the new replay buffer should handle timeout.
New Features
- Tuned hyperparameters for RecurrentPPO on Swimmer
- Documentation is now built using Sphinx and hosted on read the doc
- Open RL Benchmark
Bug fixes
- Set
highway-env
version to 1.5 andsetuptools to
v65.5 for the CI - Removed
use_auth_token
for push to hub util - Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see openai/gym#1304)
- Fixed
gym-minigrid
policy (fromMlpPolicy
toMultiInputPolicy
)
Documentation
- Documentation is now built using Sphinx and hosted on read the doc: https://rl-baselines3-zoo.readthedocs.io/en/master/
Other
- Added support for
ruff
(fast alternative to flake8) in the Makefile - Removed Gitlab CI file
- Replaced deprecated
optuna.suggest_loguniform(...)
byoptuna.suggest_float(..., log=True)
- Switched to
ruff
andpyproject.toml
- Removed
online_sampling
andmax_episode_length
argument when usingHerReplayBuffer