ICLR 2024 Spotlight
Edward S. Hu, James Springer, Oleh Rybkin, Dinesh Jayaraman
Welcome to the codebase for "Privileged Sensing Scaffolds Reinforcement Learning". Here, you will find:
- Scaffolder, a model-based RL method that uses privileged observations to better train policies.
- Sensory Scaffolding Suite (S3), a privileged POMDP benchmark of 10 diverse tasks containing locomotion, dexterous manipulation, piano playing, and more.
If you find our paper or code useful, please reference us:
@inproceedings{
hu2024privileged,
title={Privileged Sensing Scaffolds Reinforcement Learning},
author={Edward S. Hu and James Springer and Oleh Rybkin and Dinesh Jayaraman},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=EpVe8jAjdx}
}
To learn more about Scaffolder:
To use Scaffolder, we need to install the Scaffolder algorithm and any environments we would like to run. Scaffolder builds on top of the DreamerV3 codebase, and expects the environments to follow the Gymnasium API. We expect the environment to return observation dictionaries, and specify through Scaffolder's configuration which observation keys are privileged or not.
After installating Scaffolder and environments, the folder structure should look like this:
projects/ # Your project folder
|- scaffolder/ # code for Scaffolder algorithm
|- gymnasium_robotics/ # code for 7/10 S3 tasks
|- gymnasium/ # code for Blind Locomotion task
|- robopianist/ # code for Blind Deaf Piano task
|- brachiation/ # code for Noisy Monkey task
Scaffolder runs on both Ubuntu and MacOS. I recommend installing it on MacOS for fast local development, and Ubuntu for actual GPU training.
conda create -n scaffolder python=3.8
Python 3.9 or later should also work, just that on my laptop it's 3.8.
-
First, install jax following their instructions.
-
Then, clone Scaffolder's codebase, which extends DreamerV3.
git clone [email protected]:penn-pal-lab/scaffolder.git
# install dependencies.
pip install -r requirements.txt
# install scaffolder as a local python package
pip install -e .
It's likely you will run into some versioning errors during installation of the packages in requirements.txt
since they are not pinned to any versions. You can just overcome these by commenting out the troublesome packages in the file and manually install them with pip install <your package>==<desired version>
.
The Sensory Scaffolding Suite (S3) is a collection of 10 tasks with predefined privileged and target observation spaces. The tasks are implemented via different repositories, so for a given task, you must install the corresponding repo. For users looking to just get started, I recommend just installing Gymnasium Robotics since that covers 7/10 tasks in S3.
Repository | Task |
---|---|
Gymnasium Robotics | Blind Pick, Wrist Pick-Place, Occluded Pick-Place, Blind Numb Cube, Blind Numb Pen, RGB Cube, RGB Pen |
Gymnasium | Blind Locomotion |
Robopianist | Blind Deaf Piano |
Brachiation | Noisy Monkey |
Gymnasium Robotics Installation
Clone our custom fork of Gymnasium Robotics, change to the correct branch, and install dependencies.git clone [email protected]:edwhu/Gymnasium-Robotics.git
cd gymnasium_robotics
git checkout v0.1
pip install -e . # install this library as a local python package
Gymnasium Installation (TODO)
Robopianist Installation (TODO)
Brachiation Installation (TODO)
As a sanity check, let's see if the core Scaffolder training loop runs with very small networks and low dimensional state inputs.
python -u dreamerv3/train.py --logdir ~/logdir/test_blindpick_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blindpick,sanity_check
It should take a couple minutes to run. After it's done, we can check the tensorboard outputs.
cd ~/logdir/
tensorboard --logdir .
You should see some scalars and GIFs on tensorboard. This means the training loop is working, and the installation is successful.
In general, you need to specify the environment and the model size configurations. You can override the configurations at the command line level as well. Below, we provide examples for running Scaffolder on each S3 task.
Blind Pick
python -u dreamerv3/train.py --logdir ~/logdir/blindpick_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blindpick,small
Blind Locomotion
python -u dreamerv3/train.py --logdir ~/logdir/blindlocomotion_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blindlocomotion,small
Blind Deaf Piano
python -u dreamerv3/train.py --logdir ~/logdir/blinddeafpiano_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blinddeafpiano,large
Blind Numb Cube
python -u dreamerv3/train.py --logdir ~/logdir/blindnumbcube_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blindnumbcube,small
Blind Numb Pen
python -u dreamerv3/train.py --logdir ~/logdir/blindnumbpen_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blindnumbpen,small
Noisy Monkey
python -u dreamerv3/train.py --logdir ~/logdir/noisymonkey_$(date "+%Y%m%d-%H%M%S") --configs gym_noisymonkey,small
Wrist Pick Place
python -u dreamerv3/train.py --logdir ~/logdir/wristpickplace_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_wristpickplace,small
Occluded Pick Place
python -u dreamerv3/train.py --logdir ~/logdir/occludedpickplace_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_occludedpickplace,small
RGB Cube
python -u dreamerv3/train.py --logdir ~/logdir/rgbcube_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_rgbcube,large
RGB Pen
python -u dreamerv3/train.py --logdir ~/logdir/rgbpen_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_rgbpen,large
Check out the slurm
folder to see example scripts for running experiments on a cluster.
This project would not be possible without the work of others. We thank them for their contributions, and we hope others will build on Scaffolder like we have with these works.
- DreamerV3, the MBRL code we build upon.
All the various environment code we used for defining S3.
All the baseline code for the evaluations.
- Informed Dreamer
- Rapid Motor Adapation implementation from the HORA project
- Asymmetric Actor Critic was implemented via CleanRL.