Skip to content

SB3 v1.6.0: Huggingface hub integration, Recurrent PPO (PPO LSTM)

Compare
Choose a tag to compare
@araffin araffin released this 17 Aug 15:47
· 78 commits to master since this release
89d4e0c

Release 1.6.0 (2022-08-05)

Breaking Changes

  • Change default value for number of hyperparameter optimization trials from 10 to 500. (@ernestum)
  • Derive number of intermediate pruning evaluations from number of time steps (1 evaluation per 100k time steps.) (@ernestum)
  • Updated default --eval-freq from 10k to 25k steps
  • Update default horizon to 2 for the HistoryWrapper
  • Upgrade to Stable-Baselines3 (SB3) >= 1.6.0
  • Upgrade to sb3-contrib >= 1.6.0

New Features

  • Support setting PyTorch's device with thye --device flag (@Gregwar)
  • Add --max-total-trials parameter to help with distributed optimization. (@ernestum)
  • Added vec_env_wrapper support in the config (works the same as env_wrapper)
  • Added Huggingface hub integration
  • Added RecurrentPPO support (aka ppo_lstm)
  • Added autodownload for "official" sb3 models from the hub
  • Added Humanoid-v3, Ant-v3, Walker2d-v3 models for A2C (@pseudo-rnd-thoughts)
  • Added MsPacman models

Bug fixes

  • Fix Reacher-v3 name in PPO hyperparameter file
  • Pinned ale-py==0.7.4 until new SB3 version is released
  • Fix enjoy / record videos with LSTM policy
  • Fix bug with environments that have a slash in their name (@ernestum)
  • Changed optimize_memory_usage to False for DQN/QR-DQN on Atari games,
    if you want to save RAM, you need to deactivate handle_timeout_termination
    in the replay_buffer_kwargs

Documentation

Other

  • When pruner is set to "none", use NopPruner instead of diverted MedianPruner (@qgallouedec)