Skip to content

Latest commit

 

History

History
294 lines (232 loc) · 10.2 KB

README.md

File metadata and controls

294 lines (232 loc) · 10.2 KB

ReLAx

ReLAx - Reinforcement Learning Applications

ReLAx is an object oriented library for deep reinforcement learning built on top of PyTorch.

Contents

Implemented Algorithms

ReLAx library contains implementations of the following algorithms:

Special Features

ReLAx offers a set of special features:

And other options for building non-standard RL architectures:

Usage With Custom Environments

Some examples of how to write custom user-defined environments and use them with ReLAx:

Minimal Examples

On Policy

import torch
import gym

from relax.rl.actors import VPG
from relax.zoo.policies import CategoricalMLP
from relax.data.sampling import Sampler

# Create training and eval envs
env = gym.make("CartPole-v1")
eval_env = gym.make("CartPole-v1")

# Wrap them into Sampler
sampler = Sampler(env)
eval_sampler = Sampler(eval_env)

# Define Vanilla Policy Gradient actor
actor = VPG(
    device=torch.device('cuda'), # torch.device('cpu') if no gpu available
    policy_net=CategoricalMLP(acs_dim=2, obs_dim=4,
                              nlayers=2, nunits=64),
    learning_rate=0.01
)

# Run training loop:
for i in range(100):
    
    # Sample training data
    train_batch = sampler.sample(n_transitions=1000,
                                 actor=actor,
                                 train_sampling=True)
    
    # Update VPG actor
    actor.update(train_batch)
    
    # Collect evaluation episodes
    eval_batch = eval_sampler.sample_n_episodes(n_episodes=5,
                                                actor=actor,
                                                train_sampling=False)
    
    # Print average return per iteration
    print(f"Iter: {i}, eval score: {eval_batch.create_logs()['avg_return']}")
    

Off policy

import torch
import gym

from relax.rl.actors import ArgmaxQValue
from relax.rl.critics import DQN

from relax.exploration import EpsilonGreedy
from relax.schedules import PiecewiseSchedule
from relax.zoo.critics import DiscQMLP

from relax.data.sampling import Sampler
from relax.data.replay_buffer import ReplayBuffer

# Create training and eval envs
env = gym.make("CartPole-v1")
eval_env = gym.make("CartPole-v1")

# Wrap them into Sampler
sampler = Sampler(env)
eval_sampler = Sampler(eval_env)

# Define schedules
# First 5k no learning - only random sampling
lr_schedule = PiecewiseSchedule({0: 5000}, 5e-5)
eps_schedule = PiecewiseSchedule({1: 5000}, 1e-3)

# Define actor
actor = ArgmaxQValue(
    exploration=EpsilonGreedy(eps=eps_schedule)
)

# Define critic
critic = DQN(
    device=torch.device('cuda'), # torch.device('cpu') if no gpu available
    critic_net=DiscQMLP(obs_dim=4, acs_dim=2, 
                        nlayers=2, nunits=64),
    learning_rate=lr_schedule,
    batch_size=100,
    target_updates_freq=3000
)

# Provide actor with critic
actor.set_critic(critic)

# Run q-iteration training loop:
print_every = 1000
replay_buffer = ReplayBuffer(100000)

for i in range(100000):
    
    # Sample training data (one transition)
    train_batch = sampler.sample(n_transitions=1,
                                 actor=actor,
                                 train_sampling=True)
                                 
    # Add it to buffer                             
    replay_buffer.add_paths(train_batch)
    
    # Update DQN critic
    critic.update(replay_buffer)
    
    # Update ArgmaxQValue actor (only to step schedules)
    actor.update()
    
    if i > 0 and i % print_every == 0:
      # Collect evaluation episodes
      eval_batch = eval_sampler.sample_n_episodes(n_episodes=5,
                                                  actor=actor,
                                                  train_sampling=False)

      # Print average return per iteration
      print(f"Iter: {i}, eval score: " + \
            f"{eval_batch.create_logs()['avg_return']}, " + \
            "buffer score: " + \
            f"{replay_buffer.create_logs()['avg_return']}")

Installation

Building from GitHub Source

Installing into a separate virtual environment:

git clone https://github.com/nslyubaykin/relax
cd relax
conda create -n relax python=3.6
conda activate relax
pip install -r requirements.txt
pip install -e .

Mujoco

To install Mujoco do the following steps:

mkdir ~/.mujoco
cd ~/.mujoco
wget http://www.roboti.us/download/mujoco200_linux.zip
unzip mujoco200_linux.zip
mv mujoco200_linux mujoco200
rm mujoco200_linux.zip
wget http://www.roboti.us/file/mjkey.txt

Then, add the following line to the bottom of your bashrc:

export LD_LIBRARY_PATH=~/.mujoco/mujoco200/bin/

Finally, install mujoco_py itself:

pip install mujoco-py==2.0.2.2

!Note: very often installation crushes with error: error: command 'gcc' failed with exit status 1. To debug this run:

sudo apt-get install gcc
sudo apt-get install build-essential

And then again try to install mujoco-py==2.0.2.2

Atari Environments

ReLAx package was developed and tested with gym[atari]==0.17.2. Newer versions also should work, however, its compatibility with provided Atari wrappers is uncertain.

Here is Gym Atari installation guide:

pip install gym[atari]==0.17.2

In case of "ROMs not found" error do the following steps:

  1. Download ROMs archive
wget http://www.atarimania.com/roms/Roms.rar
  1. Unpack it
unrar x Roms.rar
  1. Install atari_py
pip install atari_py
  1. Provide atari_py with ROMS
python -m atari_py.import_roms ROMS

Further Developments

In the future the following functionality is planned to be added:

  • Curiosity (RND)
  • Offline RL (CQL, BEAR, BCQ, SAC-N, EDAC)
  • Decision Transformers
  • PPG
  • QR-DQN
  • IQN
  • FQF
  • Discrete SAC
  • NAF
  • Stochastic environment models
  • Improving documentation

Known Issues

  • Lack of documentation (right now compensated with usage examples)
  • On some systems relax.zoo.layers.NoisyLinear seems to leak memory. This issue is very unpredictable and yet not fully understood. Sometimes, installing different versions of PyTorch and CUDA may fix it. If the problem persists, as a workaround, consider not using noisy linear layers.
  • Filtering & Reward Weighted Refinement declared performance in paper is not yet reached
  • DYNA-Q is not compatible with PER as it is not clear which priority to assign to synthetic branched transitions (possible option: same priority as its parent transition)