Implementation of "Safe Exploration in Continuous Action Spaces"

Introduction

This repository contains Pytorch implementation of paper "Safe Exploration in Continuous Action Spaces" (Dalal et al., 2018).

Setup

The code requires Python 3.6+ and is tested with torch 1.1.0. To install dependencies run the following command.

pip install -r requirements.txt

Training

A list of parameters and their default values is printed with the following command.

python -m safe_explorer.main --help

With the following command the agent is trained on the ball domain.

python -m safe_explorer.main --main_trainer_task ballnd

... on spaceship domain.

python -m safe_explorer.main --main_trainer_task spaceship

The training can be monitored via Tensorboard with the following command.

tensorboard --logdir=runs

Domains

The ball-ND and spaceship domain from (Dalal et al., 2018) are implemented in custom OpenAI gym environments (see safe-explorer/env).

Ball-ND Domain

For the 1D- & 3D-case env.render() is implemented to give a visual output. The green circle depicts the agent's ball and the red circle depicts the target.

Spaceship Domain

TODO

Results

I managed to get similiar results to (Dalal et al., 2018).

Ball-1D parameters

Domain parameters

- n=1
- target_margin=0.2
- agent_slack=0.1
- episode_length=30
- time_step=0.1
- respawn_interval=2
- target_noise_std=0.05
- enable_reward_shaping=false
- reward_shaping_margin=0.14  
- control_acceleration=false

DDPG parameters

- epochs=500
- training_episodes_per_epoch=1
- evaluation_episodes_per_epoch=1
- batch_size=256
- memory_buffer_size=1000000
- gamma=0.99
- tau=0.001 
- reward_scale=1.0
- actor_layers=[64,64]
- critic_layers=[256,256]
- actor_lr=0.0001
- critic_lr=0.001
- actor_weight_decay=0.0
- critic_weight_decay=0.01

Safety Layer parameters

- layers=[10]
- epochs=10
- training_steps_per_epoch=1000
- evaluation_steps_per_epoch=20
- sample_data_episodes=1000
- batch_size=256
- memory_buffer_size=1000000
- lr=0.001
- correction_scale=5.0

Ball-3D parameters

Domain parameters

- n=3
- target_margin=0.2
- agent_slack=0.1
- episode_length=30
- time_step=0.1
- respawn_interval=2
- target_noise_std=0.05
- enable_reward_shaping=false
- reward_shaping_margin=0.14  
- control_acceleration=false

DDPG parameters

- epochs=500
- training_episodes_per_epoch=1
- evaluation_episodes_per_epoch=1
- batch_size=256
- memory_buffer_size=1000000
- gamma=0.99
- tau=0.001 
- reward_scale=10.0
- actor_layers=[64,64]
- critic_layers=[256,256]
- actor_lr=0.0001
- critic_lr=0.001
- actor_weight_decay=0.0
- critic_weight_decay=0.01

Safety Layer parameters

- layers=[10]
- epochs=10
- training_steps_per_epoch=1000
- evaluation_steps_per_epoch=20
- sample_data_episodes=1000
- batch_size=256
- memory_buffer_size=1000000
- lr=0.001
- correction_scale=5.0

References

Dalal, G., K. Dvijotham, M. Vecerik, T. Hester, C. Paduraru, and Y. Tassa (2018). “Safe Exploration in Continuous Action Spaces.” In: CoRR abs/1801.08757. arXiv: 1801.08757.
Lillicrap, T. P., J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra (May 2016). “Continuous control with deep reinforcement learning.” In: 4th International Conference on Learning Representations, (ICLR 2016), Conference Track Proceedings. Ed. by Y. Bengio and Y. LeCun. ICLR’16. San Juan, Puerto Rico.

Acknowledgements

This repository was originally a fork from https://github.com/AgrawalAmey/safe-explorer. I have re-implemented most of the DDPG, Safety Layer, and domains, therefore I have detached the fork. Some parts concerning the structure of the repository are reminiscent from the original fork.

The Deep Determinitic Policy Gradient (DDPG) (Lillicrap et al., 2016) implementation is based on this implementation: Deep Deterministic Policy Gradients Explained.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
config		config
images		images
safe_explorer		safe_explorer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation of "Safe Exploration in Continuous Action Spaces"

Introduction

Setup

Training

Domains

Ball-ND Domain

Spaceship Domain

Results

Ball-1D parameters

Domain parameters

DDPG parameters

Safety Layer parameters

Ball-3D parameters

Domain parameters

DDPG parameters

Safety Layer parameters

References

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

kollerlukas/safe-explorer

Folders and files

Latest commit

History

Repository files navigation

Implementation of "Safe Exploration in Continuous Action Spaces"

Introduction

Setup

Training

Domains

Ball-ND Domain

Spaceship Domain

Results

Ball-1D parameters

Domain parameters

DDPG parameters

Safety Layer parameters

Ball-3D parameters

Domain parameters

DDPG parameters

Safety Layer parameters

References

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages