ReLAx

ReLAx - Reinforcement Learning Applications

ReLAx is an object oriented library for deep reinforcement learning built on top of PyTorch.

Value Based (Model-Free):
- On-Policy
  - VPG: example
  - A2C: example
  - TRPO: example
  - PPO: example
- Off-policy
  - DQN: example
  - Double DQN: example
  - Dueling DQN: example
  - Noisy DQN: example
  - Categorical DQN: example
  - RAINBOW: example
  - DDPG: example
  - TD3: example
  - SAC: example
Model Based:
- Random Shooting: example
- Cross Entropy Method (CEM): example
- Filtering & Reward Weigthed Refinement (PDDM): example
Hybrid MB-MF
- MBPO: example
- DYNA-Q: example

Special Features

ReLAx offers a set of special features:

Simple interface for lagging environment observations: Recurrent Policies for Handling Partially Observable Environments
Sampling from parallel environments: Speeding Up PPO with Parallel Sampling
Wide possibilities for scheduling hyper-parameters: Scheduling TRPO's KL Divergence Constraint
Support of N-step bootstrapping for all off-policy value-based algorithms: Multistep TD3 for Locomotion
Support of Prioritized Experience Replay for all off-policy value-based algorithms: Prioritised DDQN for Atari-2600
Simple interface for model-based acceleration: DYNA Model-Based Acceleration with TD3 / MBPO with SAC

And other options for building non-standard RL architectures:

Training PPO with DQN as a critic
Multi-tasking with model-based RL

Usage With Custom Environments

Some examples of how to write custom user-defined environments and use them with ReLAx:

Playing 2048 with RAINBOW

Minimal Examples

On Policy

import torch
import gym

from relax.rl.actors import VPG
from relax.zoo.policies import CategoricalMLP
from relax.data.sampling import Sampler

# Create training and eval envs
env = gym.make("CartPole-v1")
eval_env = gym.make("CartPole-v1")

# Wrap them into Sampler
sampler = Sampler(env)
eval_sampler = Sampler(eval_env)

# Define Vanilla Policy Gradient actor
actor = VPG(
    device=torch.device('cuda'), # torch.device('cpu') if no gpu available
    policy_net=CategoricalMLP(acs_dim=2, obs_dim=4,
                              nlayers=2, nunits=64),
    learning_rate=0.01
)

# Run training loop:
for i in range(100):
    
    # Sample training data
    train_batch = sampler.sample(n_transitions=1000,
                                 actor=actor,
                                 train_sampling=True)
    
    # Update VPG actor
    actor.update(train_batch)
    
    # Collect evaluation episodes
    eval_batch = eval_sampler.sample_n_episodes(n_episodes=5,
                                                actor=actor,
                                                train_sampling=False)
    
    # Print average return per iteration
    print(f"Iter: {i}, eval score: {eval_batch.create_logs()['avg_return']}")

Off policy

import torch
import gym

from relax.rl.actors import ArgmaxQValue
from relax.rl.critics import DQN

from relax.exploration import EpsilonGreedy
from relax.schedules import PiecewiseSchedule
from relax.zoo.critics import DiscQMLP

from relax.data.sampling import Sampler
from relax.data.replay_buffer import ReplayBuffer

# Create training and eval envs
env = gym.make("CartPole-v1")
eval_env = gym.make("CartPole-v1")

# Wrap them into Sampler
sampler = Sampler(env)
eval_sampler = Sampler(eval_env)

# Define schedules
# First 5k no learning - only random sampling
lr_schedule = PiecewiseSchedule({0: 5000}, 5e-5)
eps_schedule = PiecewiseSchedule({1: 5000}, 1e-3)

# Define actor
actor = ArgmaxQValue(
    exploration=EpsilonGreedy(eps=eps_schedule)
)

# Define critic
critic = DQN(
    device=torch.device('cuda'), # torch.device('cpu') if no gpu available
    critic_net=DiscQMLP(obs_dim=4, acs_dim=2, 
                        nlayers=2, nunits=64),
    learning_rate=lr_schedule,
    batch_size=100,
    target_updates_freq=3000
)

# Provide actor with critic
actor.set_critic(critic)

# Run q-iteration training loop:
print_every = 1000
replay_buffer = ReplayBuffer(100000)

for i in range(100000):
    
    # Sample training data (one transition)
    train_batch = sampler.sample(n_transitions=1,
                                 actor=actor,
                                 train_sampling=True)
                                 
    # Add it to buffer                             
    replay_buffer.add_paths(train_batch)
    
    # Update DQN critic
    critic.update(replay_buffer)
    
    # Update ArgmaxQValue actor (only to step schedules)
    actor.update()
    
    if i > 0 and i % print_every == 0:
      # Collect evaluation episodes
      eval_batch = eval_sampler.sample_n_episodes(n_episodes=5,
                                                  actor=actor,
                                                  train_sampling=False)

      # Print average return per iteration
      print(f"Iter: {i}, eval score: " + \
            f"{eval_batch.create_logs()['avg_return']}, " + \
            "buffer score: " + \
            f"{replay_buffer.create_logs()['avg_return']}")

Installation

Building from GitHub Source

Installing into a separate virtual environment:

git clone https://github.com/nslyubaykin/relax
cd relax
conda create -n relax python=3.6
conda activate relax
pip install -r requirements.txt
pip install -e .

Mujoco

To install Mujoco do the following steps:

mkdir ~/.mujoco
cd ~/.mujoco
wget http://www.roboti.us/download/mujoco200_linux.zip
unzip mujoco200_linux.zip
mv mujoco200_linux mujoco200
rm mujoco200_linux.zip
wget http://www.roboti.us/file/mjkey.txt

Then, add the following line to the bottom of your bashrc:

export LD_LIBRARY_PATH=~/.mujoco/mujoco200/bin/

Finally, install mujoco_py itself:

pip install mujoco-py==2.0.2.2

!Note: very often installation crushes with error: error: command 'gcc' failed with exit status 1. To debug this run:

sudo apt-get install gcc
sudo apt-get install build-essential

And then again try to install mujoco-py==2.0.2.2

Atari Environments

ReLAx package was developed and tested with gym[atari]==0.17.2. Newer versions also should work, however, its compatibility with provided Atari wrappers is uncertain.

Here is Gym Atari installation guide:

pip install gym[atari]==0.17.2

In case of "ROMs not found" error do the following steps:

Download ROMs archive

wget http://www.atarimania.com/roms/Roms.rar

Unpack it

unrar x Roms.rar

Install atari_py

pip install atari_py

Provide atari_py with ROMS

python -m atari_py.import_roms ROMS

Further Developments

In the future the following functionality is planned to be added:

Curiosity (RND)
Offline RL (CQL, BEAR, BCQ, SAC-N, EDAC)
Decision Transformers
PPG
QR-DQN
IQN
FQF
Discrete SAC
NAF
Stochastic environment models
Improving documentation

Known Issues

Lack of documentation (right now compensated with usage examples)
On some systems relax.zoo.layers.NoisyLinear seems to leak memory. This issue is very unpredictable and yet not fully understood. Sometimes, installing different versions of PyTorch and CUDA may fix it. If the problem persists, as a workaround, consider not using noisy linear layers.
Filtering & Reward Weighted Refinement declared performance in paper is not yet reached
DYNA-Q is not compatible with PER as it is not clear which priority to assign to synthetic branched transitions (possible option: same priority as its parent transition)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ReLAx

Contents

Implemented Algorithms

Special Features

Usage With Custom Environments

Minimal Examples

On Policy

Off policy

Installation

Building from GitHub Source

Mujoco

Atari Environments

Further Developments

Known Issues

Files

README.md

Latest commit

History

README.md

File metadata and controls

ReLAx

Contents

Implemented Algorithms

Special Features

Usage With Custom Environments

Minimal Examples

On Policy

Off policy

Installation

Building from GitHub Source

Mujoco

Atari Environments

Further Developments

Known Issues