🏗️ Privileged Sensing Scaffolds Reinforcement Learning

ICLR 2024 Spotlight
Edward S. Hu, James Springer, Oleh Rybkin, Dinesh Jayaraman

Welcome to the codebase for "Privileged Sensing Scaffolds Reinforcement Learning". Here, you will find:

Scaffolder, a model-based RL method that uses privileged observations to better train policies.
Sensory Scaffolding Suite (S3), a privileged POMDP benchmark of 10 diverse tasks containing locomotion, dexterous manipulation, piano playing, and more.

If you find our paper or code useful, please reference us:

@inproceedings{
  hu2024privileged,
  title={Privileged Sensing Scaffolds Reinforcement Learning},
  author={Edward S. Hu and James Springer and Oleh Rybkin and Dinesh Jayaraman},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=EpVe8jAjdx}
}

To learn more about Scaffolder:

Quickstart

To use Scaffolder, we need to install the Scaffolder algorithm and any environments we would like to run. Scaffolder builds on top of the DreamerV3 codebase, and expects the environments to follow the Gymnasium API. We expect the environment to return observation dictionaries, and specify through Scaffolder's configuration which observation keys are privileged or not.

After installating Scaffolder and environments, the folder structure should look like this:

projects/                   # Your project folder
  |- scaffolder/            # code for Scaffolder algorithm
  |- gymnasium_robotics/    # code for 7/10 S3 tasks
  |- gymnasium/             # code for Blind Locomotion task
  |- robopianist/           # code for Blind Deaf Piano task
  |- brachiation/           # code for Noisy Monkey task

1. Create Conda environment.

Scaffolder runs on both Ubuntu and MacOS. I recommend installing it on MacOS for fast local development, and Ubuntu for actual GPU training.

conda create -n scaffolder python=3.8

Python 3.9 or later should also work, just that on my laptop it's 3.8.

2. Install Scaffolder algorithm

First, install jax following their instructions.
Then, clone Scaffolder's codebase, which extends DreamerV3.

git clone [email protected]:penn-pal-lab/scaffolder.git
# install dependencies.
pip install -r requirements.txt 
# install scaffolder as a local python package
pip install -e .

It's likely you will run into some versioning errors during installation of the packages in requirements.txt since they are not pinned to any versions. You can just overcome these by commenting out the troublesome packages in the file and manually install them with pip install <your package>==<desired version>.

3. Install Sensory Scaffolding Suite

The Sensory Scaffolding Suite (S3) is a collection of 10 tasks with predefined privileged and target observation spaces. The tasks are implemented via different repositories, so for a given task, you must install the corresponding repo. For users looking to just get started, I recommend just installing Gymnasium Robotics since that covers 7/10 tasks in S3.

Repository	Task
Gymnasium Robotics	Blind Pick, Wrist Pick-Place, Occluded Pick-Place, Blind Numb Cube, Blind Numb Pen, RGB Cube, RGB Pen
Gymnasium	Blind Locomotion
Robopianist	Blind Deaf Piano
Brachiation	Noisy Monkey

Gymnasium Robotics Installation

Clone our custom fork of Gymnasium Robotics, change to the correct branch, and install dependencies.

git clone [email protected]:edwhu/Gymnasium-Robotics.git
cd gymnasium_robotics
git checkout v0.1
pip install -e . # install this library as a local python package

Gymnasium Installation (TODO)

Robopianist Installation (TODO)

Brachiation Installation (TODO)

4. Test installation

As a sanity check, let's see if the core Scaffolder training loop runs with very small networks and low dimensional state inputs.

python -u dreamerv3/train.py --logdir ~/logdir/test_blindpick_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blindpick,sanity_check

It should take a couple minutes to run. After it's done, we can check the tensorboard outputs.

cd ~/logdir/
tensorboard --logdir .

You should see some scalars and GIFs on tensorboard. This means the training loop is working, and the installation is successful.

Running Experiments

In general, you need to specify the environment and the model size configurations. You can override the configurations at the command line level as well. Below, we provide examples for running Scaffolder on each S3 task.

Blind Pick

python -u dreamerv3/train.py --logdir ~/logdir/blindpick_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blindpick,small

Blind Locomotion

python -u dreamerv3/train.py --logdir ~/logdir/blindlocomotion_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blindlocomotion,small

Blind Deaf Piano

python -u dreamerv3/train.py --logdir ~/logdir/blinddeafpiano_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blinddeafpiano,large

Blind Numb Cube

python -u dreamerv3/train.py --logdir ~/logdir/blindnumbcube_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blindnumbcube,small

Blind Numb Pen

python -u dreamerv3/train.py --logdir ~/logdir/blindnumbpen_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blindnumbpen,small

Noisy Monkey

python -u dreamerv3/train.py --logdir ~/logdir/noisymonkey_$(date "+%Y%m%d-%H%M%S") --configs gym_noisymonkey,small

Wrist Pick Place

python -u dreamerv3/train.py --logdir ~/logdir/wristpickplace_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_wristpickplace,small

Occluded Pick Place

python -u dreamerv3/train.py --logdir ~/logdir/occludedpickplace_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_occludedpickplace,small

RGB Cube

python -u dreamerv3/train.py --logdir ~/logdir/rgbcube_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_rgbcube,large

RGB Pen

python -u dreamerv3/train.py --logdir ~/logdir/rgbpen_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_rgbpen,large

Check out the slurm folder to see example scripts for running experiments on a cluster.

Scaffolder Code Walkthrough (TODO)

Overview

Adding a new environment

Acknowledgements

This project would not be possible without the work of others. We thank them for their contributions, and we hope others will build on Scaffolder like we have with these works.

DreamerV3, the MBRL code we build upon.

All the various environment code we used for defining S3.

All the baseline code for the evaluations.

Informed Dreamer
Rapid Motor Adapation implementation from the HORA project
Asymmetric Actor Critic was implemented via CleanRL.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
docs		docs
dreamerv3		dreamerv3
slurm		slurm
.gitignore		.gitignore
DREAMERV3_README.md		DREAMERV3_README.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
example.py		example.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏗️ Privileged Sensing Scaffolds Reinforcement Learning

Quickstart

1. Create Conda environment.

2. Install Scaffolder algorithm

3. Install Sensory Scaffolding Suite

4. Test installation

Running Experiments

Scaffolder Code Walkthrough (TODO)

Overview

Adding a new environment

Acknowledgements

About

Releases

Packages

Languages

License

penn-pal-lab/scaffolder

Folders and files

Latest commit

History

Repository files navigation

🏗️ Privileged Sensing Scaffolds Reinforcement Learning

Quickstart

1. Create Conda environment.

2. Install Scaffolder algorithm

3. Install Sensory Scaffolding Suite

4. Test installation

Running Experiments

Scaffolder Code Walkthrough (TODO)

Overview

Adding a new environment

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages