Skip to content
This repository has been archived by the owner on Dec 18, 2024. It is now read-only.

Latest commit

 

History

History
34 lines (30 loc) · 2.09 KB

README.md

File metadata and controls

34 lines (30 loc) · 2.09 KB

Distributed SAC Utilities

architecture

Utilities for training reinforcement learning policies with the Soft Actor-Critic (SAC) algorithm. It uses TensorFlow Agents, and includes the following features:

  • Following this TF-Agents distributed training example, the framework is cleanly divided into completely separate programs:
    • Experience collection workers (each with their own environment)
    • Replay buffer implemented with deepmind/reverb
    • SAC policy trainer
  • Can seed the replay buffer with experience collected with a random policy, to encourage exploration
  • Finegrained control over the number of CPUs allocated to each program
  • Checkpointing and tensorboard logging
  • "Supervision" of the training using daemontools/supervise automatically resumes the training from the last checkpoint if some program crashes, which is useful when running on a compute cluster
  • SLURM compute cluster support
  • Configure the environment hyperparameters and their curriculum with JSON