The Unsupervised Reinforcement Learning Suite (URLS)

URLS aims to provide a set of unsupervised reinforcement learning algorithms and experiments for the purpose of researching the applicability of unsupervised reinforcement learning to a variety of paradigms.

The codebase is based upon URLB and ExORL. Further details are provided in the following papers:

URLS is intended as a successor to URLB allowing for an increased number of experiments and RL paradigms.

Prerequisites

Install MuJoCo if it is not already the case:

Download MuJoCo binaries here.
Unzip the downloaded archive into ~/.mujoco/.
Append the MuJoCo subdirectory bin path into the env variable LD_LIBRARY_PATH.

Install the following libraries:

sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 unzip

Install dependencies:

conda env create -f conda_env.yml
conda activate urls-env

Workflow

We provide the following workflows:

Unsupervised Reinforcement Learning

Pre-training, learn from agents intrinsic reward on a specific domain

python pretrain.py agent=UNSUPERVISED_AGENT domain=DOMAIN

Fine-tuning, learn with the pre-trained agent on a specific, task specific reward is now used for the agent

python finetune.py pretrained_agent=UNSUPERVISED_AGENT task=TASK snapshot_ts=TS obs_type=OBS_TYPE

Offline Learning from Unsupervised Reinforcement Learning

Pre-training, learn from agents intrinsic reward on a specific domain

python pretrain.py agent=UNSUPERVISED_AGENT domain=DOMAIN

Sampling, sample demos from agent replay buffer on a specific task

python sampling.py agent=UNSUPERVISED_AGENT task=TASK samples=SAMPLES snapshot_ts=TS obs_type=OBS_TYPE

Offline-learning, learn a policy using the offline data collected on the specific task.

python train_offline.py agent=OFFLINE_AGENT expl_agent=UNSUPERVISED_AGENT task=TASK

Safe Reinforcement Learning

Pre-training, learn from agents intrinsic reward on a specific domain

python pretrain.py agent=UNSUPERVISED_AGENT domain=DOMAIN

Sampling, sample demos from agent replay buffer with constraints and images

python sampling.py agent=UNSUPERVISED_AGENT task=TASK samples=SAMPLES snapshot_ts=TS obs_type=OBS_TYPE

Trajectories to Images, create image dataset from trajectories

python data_to_images.py --env=DOMAIN

Train VAE, train Variational Auto Encoder from the image dataset

python train_encoder.py --env=DOMAIN

Train MPC, train LS3 safe model predictive controller on specific domain

python train_mpc.py --env=DOMAIN

Further details found here

Unsupervised Agents

The following unsupervised reinforcement learning agents are available, replace UNSUPERVISED_AGENT with Command. For example to use DIAYN, set UNSUPERVISED_AGENT = diayn.

Agent	Command	Type	Implementation Author(s)	Paper	Intrinsic Reward
ICM	`icm`	Knowledge	Denis	paper	$\| \| g(\mathbf{z}{t+1} \| \mathbf{z}{t}, \mathbf{a}{t}) - \mathbf{z}{t+1} \| \| ^{2}$
Disagreement	`disagreement`	Knowledge	Catherine	paper	$Var{ g_{i} (\mathbf{z}{t+1} \| \mathbf{z}{t}, \mathbf{a}_{t}) }$
RND	`rnd`	Knowledge	Kevin	paper	$\| \| g(\mathbf{z}{t}, \mathbf{a}{t}) - \tilde{g}(\mathbf{z}{t}, \mathbf{a}{t}) \| \| ^{2}_{2}$
APT(ICM)	`icm_apt`	Data	Hao, Kimin	paper	$\sum_{j \in random} \log \| \| \mathbf{z}{t} - \mathbf{z}{j} \| \|$
APT(Ind)	`ind_apt`	Data	Hao, Kimin	paper	$\sum_{j \in random} \log \| \| \mathbf{z}{t} - \mathbf{z}{j} \| \|$
ProtoRL	`proto`	Data	Denis	paper	$\sum_{j \in random} \log \| \| \mathbf{z}{t} - \mathbf{z}{j} \| \|$
DIAYN	`diayn`	Competence	Misha	paper	$\log q(\mathbf{w}\|\mathbf{z}) + const$
APS	`aps`	Competence	Hao, Kimin	paper	$r_{t}^{APT}(\mathbf{z}) + \log q(\mathbf{z} \| \mathbf{w})$
SMM	`smm`	Competence	Albert	paper	$\log p^{*}(\mathbf{z}) - \log q_{\mathbf{w}}(\mathbf{z}) - \log p(\mathbf{w}) + \log d(\mathbf{w} \| \mathbf{z})$

Offline Agents

The following 5 RL procedures are available to learn a policy offline from unsupervised data. Replace OFFLINE_AGENT with Command, for example to use behavioral cloning, set OFFLINE_AGENT = bc.

Offline RL Procedure	Command	Paper
Behavior Cloning	`bc`	paper
CQL	`cql`	paper
CRR	`crr`	paper
TD3+BC	`td3_bc`	paper
TD3	`td3`	paper

Environments

The following environments with specific domains and tasks are provided. We also provide a wrapper to convert Gym environments to DMC extended time-step types based on DeepMind's acme wrapper.

Environment Type	Domain	Task
Deep Mind Control	`walker`	`stand`, `walk`, `run`, `flip`
Deep Mind Control	`quadruped`	`walk`, `run`, `stand`, `jump`
Deep Mind Control	`jaco`	`reach_top_left`, `reach_top_right`, `reach_bottom_left`, `reach_bottom_right`
Deep Mind Control	`cheetah`	`run`
Gym Box2D	`BipedalWalker-v3`	`walk`
Gym Box2D	`CarRacing-v1`	`race`
Gym Classic Control	`MountainCarContinuous-v0`	`goal`
Safe Control	`SimplePointBot`	`goal`

License

The majority of URLS including the ExORL & URLB based code is licensed under the MIT license, however portions of the project are available under separate license terms: DeepMind is licensed under the Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
agents		agents
configs		configs
docs		docs
hpc_scripts		hpc_scripts
libraries		libraries
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
collect_controlled_data.py		collect_controlled_data.py
conda_env.yml		conda_env.yml
data_to_images.py		data_to_images.py
download.sh		download.sh
download_data.py		download_data.py
finetune.py		finetune.py
main.py		main.py
pretrain.py		pretrain.py
prioritized_sampling.py		prioritized_sampling.py
rename_files.py		rename_files.py
sampling.py		sampling.py
sampling_batch.sh		sampling_batch.sh
test_gym.py		test_gym.py
test_hydra.py		test_hydra.py
test_obstacle.py		test_obstacle.py
train_encoder.py		train_encoder.py
train_mpc.py		train_mpc.py
train_offline.py		train_offline.py
upload_data.py		upload_data.py
visualize_priors.py		visualize_priors.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Unsupervised Reinforcement Learning Suite (URLS)

Prerequisites

Workflow

Unsupervised Reinforcement Learning

Offline Learning from Unsupervised Reinforcement Learning

Safe Reinforcement Learning

Unsupervised Agents

Offline Agents

Environments

License

About

Releases

Packages

Languages

AOS55/url-suite

Folders and files

Latest commit

History

Repository files navigation

The Unsupervised Reinforcement Learning Suite (URLS)

Prerequisites

Workflow

Unsupervised Reinforcement Learning

Offline Learning from Unsupervised Reinforcement Learning

Safe Reinforcement Learning

Unsupervised Agents

Offline Agents

Environments

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages