diff --git a/README.md b/README.md
index 1f0314b471..7e68959a2c 100644
--- a/README.md
+++ b/README.md
@@ -22,8 +22,6 @@
data:image/s3,"s3://crabby-images/b70e4/b70e4ce79270638b40edaf8e15885fd209ec38f6" alt="deploy"
[data:image/s3,"s3://crabby-images/fb629/fb6299f5569e1069dd9338e58fea86fada2ba80e" alt="codecov"](https://codecov.io/gh/opendilab/DI-engine)
-
-
data:image/s3,"s3://crabby-images/ee796/ee7966e5be88718c51d3d8a162d2e01151c77fd9" alt="GitHub Org's stars"
[data:image/s3,"s3://crabby-images/fcb5c/fcb5cbbb73a70776bead991b39fcadb9816a0d61" alt="GitHub stars"](https://github.com/opendilab/DI-engine/stargazers)
[data:image/s3,"s3://crabby-images/84ef5/84ef559d79e81bba16e8e2a9db56392d72ab980b" alt="GitHub forks"](https://github.com/opendilab/DI-engine/network)
@@ -37,11 +35,11 @@
Updated on 2024.02.04 DI-engine-v0.5.1
-
## Introduction to DI-engine
+
[Documentation](https://di-engine-docs.readthedocs.io/en/latest/) | [中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/) | [Tutorials](https://di-engine-docs.readthedocs.io/en/latest/01_quickstart/index.html) | [Feature](#feature) | [Task & Middleware](https://di-engine-docs.readthedocs.io/en/latest/03_system/index.html) | [TreeTensor](#general-data-container-treetensor) | [Roadmap](https://github.com/opendilab/DI-engine/issues/548)
-**DI-engine** is a generalized decision intelligence engine for PyTorch and JAX.
+**DI-engine** is a generalized decision intelligence engine for PyTorch and JAX.
It provides **python-first** and **asynchronous-native** task and middleware abstractions, and modularly integrates several of the most important decision-making concepts: Env, Policy and Model. Based on the above mechanisms, DI-engine supports **various [deep reinforcement learning](https://di-engine-docs.readthedocs.io/en/latest/10_concepts/index.html) algorithms** with superior performance, high efficiency, well-organized [documentation](https://di-engine-docs.readthedocs.io/en/latest/) and [unittest](https://github.com/opendilab/DI-engine/actions):
@@ -89,6 +87,7 @@ It provides **python-first** and **asynchronous-native** task and middleware abs
- [awesome-diffusion-model-in-rl](https://github.com/opendilab/awesome-diffusion-model-in-rl): A curated list of Diffusion Model in RL resources
- [awesome-end-to-end-autonomous-driving](https://github.com/opendilab/awesome-end-to-end-autonomous-driving): A curated list of awesome End-to-End Autonomous Driving resources
- [awesome-driving-behavior-prediction](https://github.com/opendilab/awesome-driving-behavior-prediction): A collection of research papers for Driving Behavior Prediction
+
On the low-level end, DI-engine comes with a set of highly re-usable modules, including [RL optimization functions](https://github.com/opendilab/DI-engine/tree/main/ding/rl_utils), [PyTorch utilities](https://github.com/opendilab/DI-engine/tree/main/ding/torch_utils) and [auxiliary tools](https://github.com/opendilab/DI-engine/tree/main/ding/utils).
@@ -104,6 +103,7 @@ BTW, **DI-engine** also has some special **system optimization and design** for
- [DI-orchestrator](https://github.com/opendilab/DI-orchestrator): RL Kubernetes Custom Resource and Operator Lib
- [DI-hpc](https://github.com/opendilab/DI-hpc): RL HPC OP Lib
- [DI-store](https://github.com/opendilab/DI-store): RL Object Store
+
Have fun with exploration and exploitation.
@@ -128,11 +128,13 @@ Have fun with exploration and exploitation.
## Installation
You can simply install DI-engine from PyPI with the following command:
+
```bash
pip install DI-engine
```
If you use Anaconda or Miniconda, you can install DI-engine from conda-forge through the following command:
+
```bash
conda install -c opendilab di-engine
```
@@ -155,6 +157,7 @@ And our dockerhub repo can be found [here](https://hub.docker.com/repository/doc
- cityflow: opendilab/ding:nightly-cityflow
- evogym: opendilab/ding:nightly-evogym
- d4rl: opendilab/ding:nightly-d4rl
+
The detailed documentation are hosted on [doc](https://di-engine-docs.readthedocs.io/en/latest/) | [中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/).
@@ -175,8 +178,8 @@ The detailed documentation are hosted on [doc](https://di-engine-docs.readthedoc
[新老 pipeline 的异同对比](https://di-engine-docs.readthedocs.io/zh_CN/latest/04_best_practice/diff_in_new_pipeline_zh.html)
-
## Feature
+
### Algorithm Versatility
@@ -198,7 +201,6 @@ The detailed documentation are hosted on [doc](https://di-engine-docs.readthedoc
data:image/s3,"s3://crabby-images/83bf9/83bf94f7fdd42ad1a24cc57a5f3430c761d2d2c7" alt="offline" [Offiline Reinforcement Learning](https://di-engine-docs.readthedocs.io/en/latest/02_algo/offline_rl.html)|[离线强化学习](https://di-engine-docs.readthedocs.io/zh_CN/latest/02_algo/offline_rl_zh.html)
-
data:image/s3,"s3://crabby-images/af7cc/af7cc36ebdf073538e32d96b8dc841939a1441f1" alt="mbrl" [Model-Based Reinforcement Learning](https://di-engine-docs.readthedocs.io/en/latest/02_algo/model_based_rl.html)|[基于模型的强化学习](https://di-engine-docs.readthedocs.io/zh_CN/latest/02_algo/model_based_rl_zh.html)
data:image/s3,"s3://crabby-images/885c8/885c891c00587437a844f9733605bc1604127a34" alt="other" means other sub-direction algorithms, usually as plugin-in in the whole pipeline
@@ -206,111 +208,114 @@ The detailed documentation are hosted on [doc](https://di-engine-docs.readthedoc
P.S: The `.py` file in `Runnable Demo` can be found in `dizoo`
+| No. | Algorithm | Label | Doc and Implementation | Runnable Demo |
+| :-: | :---------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------: |
+| 1 | [DQN](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [DQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/dqn.html)
[DQN中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/dqn_zh.html)
[policy/dqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py) | python3 -u cartpole_dqn_main.py / ding -m serial -c cartpole_dqn_config.py -s 0 |
+| 2 | [C51](https://arxiv.org/pdf/1707.06887.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [C51 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/c51.html)
[policy/c51](https://github.com/opendilab/DI-engine/blob/main/ding/policy/c51.py) | ding -m serial -c cartpole_c51_config.py -s 0 |
+| 3 | [QRDQN](https://arxiv.org/pdf/1710.10044.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [QRDQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qrdqn.html)
[policy/qrdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qrdqn.py) | ding -m serial -c cartpole_qrdqn_config.py -s 0 |
+| 4 | [IQN](https://arxiv.org/pdf/1806.06923.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [IQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/iqn.html)
[policy/iqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/iqn.py) | ding -m serial -c cartpole_iqn_config.py -s 0 |
+| 5 | [FQF](https://arxiv.org/pdf/1911.02140.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [FQF doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/fqf.html)
[policy/fqf](https://github.com/opendilab/DI-engine/blob/main/ding/policy/fqf.py) | ding -m serial -c cartpole_fqf_config.py -s 0 |
+| 6 | [Rainbow](https://arxiv.org/pdf/1710.02298.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [Rainbow doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/rainbow.html)
[policy/rainbow](https://github.com/opendilab/DI-engine/blob/main/ding/policy/rainbow.py) | ding -m serial -c cartpole_rainbow_config.py -s 0 |
+| 7 | [SQL](https://arxiv.org/pdf/1702.08165.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | [SQL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sql.html)
[policy/sql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sql.py) | ding -m serial -c cartpole_sql_config.py -s 0 |
+| 8 | [R2D2](https://openreview.net/forum?id=r1lyTjAqYX) | data:image/s3,"s3://crabby-images/56a6a/56a6ac66131235a9d76cdd88dd2efeef5ab07a28" alt="dist"data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [R2D2 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/r2d2.html)
[policy/r2d2](https://github.com/opendilab/DI-engine/blob/main/ding/policy/r2d2.py) | ding -m serial -c cartpole_r2d2_config.py -s 0 |
+| 9 | [PG](https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [PG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/a2c.html)
[policy/pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pg.py) | ding -m serial -c cartpole_pg_config.py -s 0 |
+| 10 | [PromptPG](https://arxiv.org/abs/2209.14610) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [policy/prompt_pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/prompt_pg.py) | ding -m serial_onpolicy -c tabmwp_pg_config.py -s 0 |
+| 11 | [A2C](https://arxiv.org/pdf/1602.01783.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [A2C doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/a2c.html)
[policy/a2c](https://github.com/opendilab/DI-engine/blob/main/ding/policy/a2c.py) | ding -m serial -c cartpole_a2c_config.py -s 0 |
+| 12 | [PPO](https://arxiv.org/abs/1707.06347)/[MAPPO](https://arxiv.org/pdf/2103.01955.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous"data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [PPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ppo.html)
[policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py) | python3 -u cartpole_ppo_main.py / ding -m serial_onpolicy -c cartpole_ppo_config.py -s 0 |
+| 13 | [PPG](https://arxiv.org/pdf/2009.04416.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [PPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ppg.html)
[policy/ppg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppg.py) | python3 -u cartpole_ppg_main.py |
+| 14 | [ACER](https://arxiv.org/pdf/1611.01224.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | [ACER doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/acer.html)
[policy/acer](https://github.com/opendilab/DI-engine/blob/main/ding/policy/acer.py) | ding -m serial -c cartpole_acer_config.py -s 0 |
+| 15 | [IMPALA](https://arxiv.org/abs/1802.01561) | data:image/s3,"s3://crabby-images/56a6a/56a6ac66131235a9d76cdd88dd2efeef5ab07a28" alt="dist"data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [IMPALA doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/impala.html)
[policy/impala](https://github.com/opendilab/DI-engine/blob/main/ding/policy/impala.py) | ding -m serial -c cartpole_impala_config.py -s 0 |
+| 16 | [DDPG](https://arxiv.org/pdf/1509.02971.pdf)/[PADDPG](https://arxiv.org/pdf/1511.04143.pdf) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous"data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | [DDPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ddpg.html)
[policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py) | ding -m serial -c pendulum_ddpg_config.py -s 0 |
+| 17 | [TD3](https://arxiv.org/pdf/1802.09477.pdf) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous"data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | [TD3 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3.html)
[policy/td3](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3.py) | python3 -u pendulum_td3_main.py / ding -m serial -c pendulum_td3_config.py -s 0 |
+| 18 | [D4PG](https://arxiv.org/pdf/1804.08617.pdf) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | [D4PG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/d4pg.html)
[policy/d4pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/d4pg.py) | python3 -u pendulum_d4pg_config.py |
+| 19 | [SAC](https://arxiv.org/abs/1801.01290)/[MASAC] | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous"data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [SAC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sac.html)
[policy/sac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sac.py) | ding -m serial -c pendulum_sac_config.py -s 0 |
+| 20 | [PDQN](https://arxiv.org/pdf/1810.06394.pdf) | data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py) | ding -m serial -c gym_hybrid_pdqn_config.py -s 0 |
+| 21 | [MPDQN](https://arxiv.org/pdf/1905.04388.pdf) | data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py) | ding -m serial -c gym_hybrid_mpdqn_config.py -s 0 |
+| 22 | [HPPO](https://arxiv.org/pdf/1903.01344.pdf) | data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | [policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py) | ding -m serial_onpolicy -c gym_hybrid_hppo_config.py -s 0 |
+| 23 | [BDQ](https://arxiv.org/pdf/1711.08946.pdf) | data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | [policy/bdq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py) | python3 -u hopper_bdq_config.py |
+| 24 | [MDQN](https://arxiv.org/abs/2007.14430) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [policy/mdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mdqn.py) | python3 -u asterix_mdqn_config.py |
+| 25 | [QMIX](https://arxiv.org/pdf/1803.11485.pdf) | data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [QMIX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qmix.html)
[policy/qmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qmix.py) | ding -m serial -c smac_3s5z_qmix_config.py -s 0 |
+| 26 | [COMA](https://arxiv.org/pdf/1705.08926.pdf) | data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [COMA doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/coma.html)
[policy/coma](https://github.com/opendilab/DI-engine/blob/main/ding/policy/coma.py) | ding -m serial -c smac_3s5z_coma_config.py -s 0 |
+| 27 | [QTran](https://arxiv.org/abs/1905.05408) | data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [policy/qtran](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qtran.py) | ding -m serial -c smac_3s5z_qtran_config.py -s 0 |
+| 28 | [WQMIX](https://arxiv.org/abs/2006.10800) | data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [WQMIX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/wqmix.html)
[policy/wqmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/wqmix.py) | ding -m serial -c smac_3s5z_wqmix_config.py -s 0 |
+| 29 | [CollaQ](https://arxiv.org/pdf/2010.08531.pdf) | data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [CollaQ doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/collaq.html)
[policy/collaq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/collaq.py) | ding -m serial -c smac_3s5z_collaq_config.py -s 0 |
+| 30 | [MADDPG](https://arxiv.org/pdf/1706.02275.pdf) | data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [MADDPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ddpg.html)
[policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py) | ding -m serial -c ptz_simple_spread_maddpg_config.py -s 0 |
+| 31 | [GAIL](https://arxiv.org/pdf/1606.03476.pdf) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [GAIL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/gail.html)
[reward_model/gail](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/gail_irl_model.py) | ding -m serial_gail -c cartpole_dqn_gail_config.py -s 0 |
+| 32 | [SQIL](https://arxiv.org/pdf/1905.11108.pdf) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [SQIL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sqil.html)
[entry/sqil](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_sqil.py) | ding -m serial_sqil -c cartpole_sqil_config.py -s 0 |
+| 33 | [DQFD](https://arxiv.org/pdf/1704.03732.pdf) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [DQFD doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/dqfd.html)
[policy/dqfd](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqfd.py) | ding -m serial_dqfd -c cartpole_dqfd_config.py -s 0 |
+| 34 | [R2D3](https://arxiv.org/pdf/1909.01387.pdf) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [R2D3 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/r2d3.html)
[R2D3中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/r2d3_zh.html)
[policy/r2d3](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/r2d3_zh.html) | python3 -u pong_r2d3_r2d2expert_config.py |
+| 35 | [Guided Cost Learning](https://arxiv.org/pdf/1603.00448.pdf) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [Guided Cost Learning中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/guided_cost_zh.html)
[reward_model/guided_cost](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/guided_cost_reward_model.py) | python3 lunarlander_gcl_config.py |
+| 36 | [TREX](https://arxiv.org/abs/1904.06387) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [TREX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/trex.html)
[reward_model/trex](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/trex_reward_model.py) | python3 mujoco_trex_main.py |
+| 37 | [Implicit Behavorial Cloning](https://implicitbc.github.io/) (DFO+MCMC) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [policy/ibc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ibc.py)
[model/template/ebm](https://github.com/opendilab/DI-engine/blob/main/ding/model/template/ebm.py) | python3 d4rl_ibc_main.py -s 0 -c pen_human_ibc_mcmc_config.py |
+| 38 | [BCO](https://arxiv.org/pdf/1805.01954.pdf) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [entry/bco](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_bco.py) | python3 -u cartpole_bco_config.py |
+| 39 | [HER](https://arxiv.org/pdf/1707.01495.pdf) | data:image/s3,"s3://crabby-images/4d57b/4d57b9b3bc6b94d05b2333e91494cc60c9ccf6db" alt="exp" | [HER doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/her.html)
[reward_model/her](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/her_reward_model.py) | python3 -u bitflip_her_dqn.py |
+| 40 | [RND](https://arxiv.org/abs/1810.12894) | data:image/s3,"s3://crabby-images/4d57b/4d57b9b3bc6b94d05b2333e91494cc60c9ccf6db" alt="exp" | [RND doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/rnd.html)
[reward_model/rnd](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/rnd_reward_model.py) | python3 -u cartpole_rnd_onppo_config.py |
+| 41 | [ICM](https://arxiv.org/pdf/1705.05363.pdf) | data:image/s3,"s3://crabby-images/4d57b/4d57b9b3bc6b94d05b2333e91494cc60c9ccf6db" alt="exp" | [ICM doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/icm.html)
[ICM中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/icm_zh.html)
[reward_model/icm](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/icm_reward_model.py) | python3 -u cartpole_ppo_icm_config.py |
+| 42 | [CQL](https://arxiv.org/pdf/2006.04779.pdf) | data:image/s3,"s3://crabby-images/83bf9/83bf94f7fdd42ad1a24cc57a5f3430c761d2d2c7" alt="offline" | [CQL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/cql.html)
[policy/cql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/cql.py) | python3 -u d4rl_cql_main.py |
+| 43 | [TD3BC](https://arxiv.org/pdf/2106.06860.pdf) | data:image/s3,"s3://crabby-images/83bf9/83bf94f7fdd42ad1a24cc57a5f3430c761d2d2c7" alt="offline" | [TD3BC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3_bc.html)
[policy/td3_bc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3_bc.py) | python3 -u d4rl_td3_bc_main.py |
+| 44 | [Decision Transformer](https://arxiv.org/pdf/2106.01345.pdf) | data:image/s3,"s3://crabby-images/83bf9/83bf94f7fdd42ad1a24cc57a5f3430c761d2d2c7" alt="offline" | [policy/dt](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dt.py) | python3 -u d4rl_dt_mujoco.py |
+| 45 | [EDAC](https://arxiv.org/pdf/2110.01548.pdf) | data:image/s3,"s3://crabby-images/83bf9/83bf94f7fdd42ad1a24cc57a5f3430c761d2d2c7" alt="offline" | [EDAC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/edac.html)
[policy/edac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/edac.py) | python3 -u d4rl_edac_main.py |
+| 46 | [QGPO](https://arxiv.org/pdf/2304.12824.pdf) | data:image/s3,"s3://crabby-images/83bf9/83bf94f7fdd42ad1a24cc57a5f3430c761d2d2c7" alt="offline" | [QGPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qgpo.html)
[policy/qgpo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qgpo.py) | python3 -u ding/example/qgpo.py |
+| 47 | MBSAC([SAC](https://arxiv.org/abs/1801.01290)+[MVE](https://arxiv.org/abs/1803.00101)+[SVG](https://arxiv.org/abs/1510.09142)) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous"data:image/s3,"s3://crabby-images/af7cc/af7cc36ebdf073538e32d96b8dc841939a1441f1" alt="mbrl" | [policy/mbpolicy/mbsac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mbpolicy/mbsac.py) | python3 -u pendulum_mbsac_mbpo_config.py \ python3 -u pendulum_mbsac_ddppo_config.py |
+| 48 | STEVESAC([SAC](https://arxiv.org/abs/1801.01290)+[STEVE](https://arxiv.org/abs/1807.01675)+[SVG](https://arxiv.org/abs/1510.09142)) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous"data:image/s3,"s3://crabby-images/af7cc/af7cc36ebdf073538e32d96b8dc841939a1441f1" alt="mbrl" | [policy/mbpolicy/mbsac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mbpolicy/mbsac.py) | python3 -u pendulum_stevesac_mbpo_config.py |
+| 49 | [MBPO](https://arxiv.org/pdf/1906.08253.pdf) | data:image/s3,"s3://crabby-images/af7cc/af7cc36ebdf073538e32d96b8dc841939a1441f1" alt="mbrl" | [MBPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/mbpo.html)
[world_model/mbpo](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/mbpo.py) | python3 -u pendulum_sac_mbpo_config.py |
+| 50 | [DDPPO](https://openreview.net/forum?id=rzvOQrnclO0) | data:image/s3,"s3://crabby-images/af7cc/af7cc36ebdf073538e32d96b8dc841939a1441f1" alt="mbrl" | [world_model/ddppo](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/ddppo.py) | python3 -u pendulum_mbsac_ddppo_config.py |
+| 51 | [DreamerV3](https://arxiv.org/pdf/2301.04104.pdf) | data:image/s3,"s3://crabby-images/af7cc/af7cc36ebdf073538e32d96b8dc841939a1441f1" alt="mbrl" | [world_model/dreamerv3](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/dreamerv3.py) | python3 -u cartpole_balance_dreamer_config.py |
+| 52 | [PER](https://arxiv.org/pdf/1511.05952.pdf) | data:image/s3,"s3://crabby-images/885c8/885c891c00587437a844f9733605bc1604127a34" alt="other" | [worker/replay_buffer](https://github.com/opendilab/DI-engine/blob/main/ding/worker/replay_buffer/advanced_buffer.py) | `rainbow demo` |
+| 53 | [GAE](https://arxiv.org/pdf/1506.02438.pdf) | data:image/s3,"s3://crabby-images/885c8/885c891c00587437a844f9733605bc1604127a34" alt="other" | [rl_utils/gae](https://github.com/opendilab/DI-engine/blob/main/ding/rl_utils/gae.py) | `ppo demo` |
+| 54 | [ST-DIM](https://arxiv.org/pdf/1906.08226.pdf) | data:image/s3,"s3://crabby-images/885c8/885c891c00587437a844f9733605bc1604127a34" alt="other" | [torch_utils/loss/contrastive_loss](https://github.com/opendilab/DI-engine/blob/main/ding/torch_utils/loss/contrastive_loss.py) | ding -m serial -c cartpole_dqn_stdim_config.py -s 0 |
+| 55 | [PLR](https://arxiv.org/pdf/2010.03934.pdf) | data:image/s3,"s3://crabby-images/885c8/885c891c00587437a844f9733605bc1604127a34" alt="other" | [PLR doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/plr.html)
[data/level_replay/level_sampler](https://github.com/opendilab/DI-engine/blob/main/ding/data/level_replay/level_sampler.py) | python3 -u bigfish_plr_config.py -s 0 |
+| 56 | [PCGrad](https://arxiv.org/pdf/2001.06782.pdf) | data:image/s3,"s3://crabby-images/885c8/885c891c00587437a844f9733605bc1604127a34" alt="other" | [torch_utils/optimizer_helper/PCGrad](https://github.com/opendilab/DI-engine/blob/main/ding/data/torch_utils/optimizer_helper.py) | python3 -u multi_mnist_pcgrad_main.py -s 0 |
-| No. | Algorithm | Label | Doc and Implementation | Runnable Demo |
-| :--: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
-| 1 | [DQN](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [DQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/dqn.html)
[DQN中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/dqn_zh.html)
[policy/dqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py) | python3 -u cartpole_dqn_main.py / ding -m serial -c cartpole_dqn_config.py -s 0 |
-| 2 | [C51](https://arxiv.org/pdf/1707.06887.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [C51 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/c51.html)
[policy/c51](https://github.com/opendilab/DI-engine/blob/main/ding/policy/c51.py) | ding -m serial -c cartpole_c51_config.py -s 0 |
-| 3 | [QRDQN](https://arxiv.org/pdf/1710.10044.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [QRDQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qrdqn.html)
[policy/qrdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qrdqn.py) | ding -m serial -c cartpole_qrdqn_config.py -s 0 |
-| 4 | [IQN](https://arxiv.org/pdf/1806.06923.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [IQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/iqn.html)
[policy/iqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/iqn.py) | ding -m serial -c cartpole_iqn_config.py -s 0 |
-| 5 | [FQF](https://arxiv.org/pdf/1911.02140.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [FQF doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/fqf.html)
[policy/fqf](https://github.com/opendilab/DI-engine/blob/main/ding/policy/fqf.py) | ding -m serial -c cartpole_fqf_config.py -s 0 |
-| 6 | [Rainbow](https://arxiv.org/pdf/1710.02298.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [Rainbow doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/rainbow.html)
[policy/rainbow](https://github.com/opendilab/DI-engine/blob/main/ding/policy/rainbow.py) | ding -m serial -c cartpole_rainbow_config.py -s 0 |
-| 7 | [SQL](https://arxiv.org/pdf/1702.08165.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | [SQL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sql.html)
[policy/sql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sql.py) | ding -m serial -c cartpole_sql_config.py -s 0 |
-| 8 | [R2D2](https://openreview.net/forum?id=r1lyTjAqYX) | data:image/s3,"s3://crabby-images/56a6a/56a6ac66131235a9d76cdd88dd2efeef5ab07a28" alt="dist"data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [R2D2 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/r2d2.html)
[policy/r2d2](https://github.com/opendilab/DI-engine/blob/main/ding/policy/r2d2.py) | ding -m serial -c cartpole_r2d2_config.py -s 0 |
-| 9 | [PG](https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [PG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/a2c.html)
[policy/pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pg.py) | ding -m serial -c cartpole_pg_config.py -s 0 |
-| 10 | [PromptPG](https://arxiv.org/abs/2209.14610) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [policy/prompt_pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/prompt_pg.py) | ding -m serial_onpolicy -c tabmwp_pg_config.py -s 0 |
-| 11 | [A2C](https://arxiv.org/pdf/1602.01783.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [A2C doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/a2c.html)
[policy/a2c](https://github.com/opendilab/DI-engine/blob/main/ding/policy/a2c.py) | ding -m serial -c cartpole_a2c_config.py -s 0 |
-| 12 | [PPO](https://arxiv.org/abs/1707.06347)/[MAPPO](https://arxiv.org/pdf/2103.01955.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous"data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [PPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ppo.html)
[policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py) | python3 -u cartpole_ppo_main.py / ding -m serial_onpolicy -c cartpole_ppo_config.py -s 0 |
-| 13 | [PPG](https://arxiv.org/pdf/2009.04416.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [PPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ppg.html)
[policy/ppg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppg.py) | python3 -u cartpole_ppg_main.py |
-| 14 | [ACER](https://arxiv.org/pdf/1611.01224.pdf) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | [ACER doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/acer.html)
[policy/acer](https://github.com/opendilab/DI-engine/blob/main/ding/policy/acer.py) | ding -m serial -c cartpole_acer_config.py -s 0 |
-| 15 | [IMPALA](https://arxiv.org/abs/1802.01561) | data:image/s3,"s3://crabby-images/56a6a/56a6ac66131235a9d76cdd88dd2efeef5ab07a28" alt="dist"data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [IMPALA doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/impala.html)
[policy/impala](https://github.com/opendilab/DI-engine/blob/main/ding/policy/impala.py) | ding -m serial -c cartpole_impala_config.py -s 0 |
-| 16 | [DDPG](https://arxiv.org/pdf/1509.02971.pdf)/[PADDPG](https://arxiv.org/pdf/1511.04143.pdf) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous"data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | [DDPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ddpg.html)
[policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py) | ding -m serial -c pendulum_ddpg_config.py -s 0 |
-| 17 | [TD3](https://arxiv.org/pdf/1802.09477.pdf) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous"data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | [TD3 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3.html)
[policy/td3](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3.py) | python3 -u pendulum_td3_main.py / ding -m serial -c pendulum_td3_config.py -s 0 |
-| 18 | [D4PG](https://arxiv.org/pdf/1804.08617.pdf) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | [D4PG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/d4pg.html)
[policy/d4pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/d4pg.py) | python3 -u pendulum_d4pg_config.py |
-| 19 | [SAC](https://arxiv.org/abs/1801.01290)/[MASAC] | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous"data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [SAC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sac.html)
[policy/sac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sac.py) | ding -m serial -c pendulum_sac_config.py -s 0 |
-| 20 | [PDQN](https://arxiv.org/pdf/1810.06394.pdf) | data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py) | ding -m serial -c gym_hybrid_pdqn_config.py -s 0 |
-| 21 | [MPDQN](https://arxiv.org/pdf/1905.04388.pdf) | data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py) | ding -m serial -c gym_hybrid_mpdqn_config.py -s 0 |
-| 22 | [HPPO](https://arxiv.org/pdf/1903.01344.pdf) | data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | [policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py) | ding -m serial_onpolicy -c gym_hybrid_hppo_config.py -s 0 |
-| 23 | [BDQ](https://arxiv.org/pdf/1711.08946.pdf) | data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | [policy/bdq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py) | python3 -u hopper_bdq_config.py |
-| 24 | [MDQN](https://arxiv.org/abs/2007.14430) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | [policy/mdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mdqn.py) | python3 -u asterix_mdqn_config.py |
-| 25 | [QMIX](https://arxiv.org/pdf/1803.11485.pdf) | data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [QMIX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qmix.html)
[policy/qmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qmix.py) | ding -m serial -c smac_3s5z_qmix_config.py -s 0 |
-| 26 | [COMA](https://arxiv.org/pdf/1705.08926.pdf) | data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [COMA doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/coma.html)
[policy/coma](https://github.com/opendilab/DI-engine/blob/main/ding/policy/coma.py) | ding -m serial -c smac_3s5z_coma_config.py -s 0 |
-| 27 | [QTran](https://arxiv.org/abs/1905.05408) | data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [policy/qtran](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qtran.py) | ding -m serial -c smac_3s5z_qtran_config.py -s 0 |
-| 28 | [WQMIX](https://arxiv.org/abs/2006.10800) | data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [WQMIX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/wqmix.html)
[policy/wqmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/wqmix.py) | ding -m serial -c smac_3s5z_wqmix_config.py -s 0 |
-| 29 | [CollaQ](https://arxiv.org/pdf/2010.08531.pdf) | data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [CollaQ doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/collaq.html)
[policy/collaq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/collaq.py) | ding -m serial -c smac_3s5z_collaq_config.py -s 0 |
-| 30 | [MADDPG](https://arxiv.org/pdf/1706.02275.pdf) | data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="MARL" | [MADDPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ddpg.html)
[policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py) | ding -m serial -c ptz_simple_spread_maddpg_config.py -s 0 |
-| 31 | [GAIL](https://arxiv.org/pdf/1606.03476.pdf) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [GAIL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/gail.html)
[reward_model/gail](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/gail_irl_model.py) | ding -m serial_gail -c cartpole_dqn_gail_config.py -s 0 |
-| 32 | [SQIL](https://arxiv.org/pdf/1905.11108.pdf) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [SQIL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sqil.html)
[entry/sqil](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_sqil.py) | ding -m serial_sqil -c cartpole_sqil_config.py -s 0 |
-| 33 | [DQFD](https://arxiv.org/pdf/1704.03732.pdf) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [DQFD doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/dqfd.html)
[policy/dqfd](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqfd.py) | ding -m serial_dqfd -c cartpole_dqfd_config.py -s 0 |
-| 34 | [R2D3](https://arxiv.org/pdf/1909.01387.pdf) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [R2D3 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/r2d3.html)
[R2D3中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/r2d3_zh.html)
[policy/r2d3](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/r2d3_zh.html) | python3 -u pong_r2d3_r2d2expert_config.py |
-| 35 | [Guided Cost Learning](https://arxiv.org/pdf/1603.00448.pdf) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [Guided Cost Learning中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/guided_cost_zh.html)
[reward_model/guided_cost](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/guided_cost_reward_model.py) | python3 lunarlander_gcl_config.py |
-| 36 | [TREX](https://arxiv.org/abs/1904.06387) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [TREX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/trex.html)
[reward_model/trex](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/trex_reward_model.py) | python3 mujoco_trex_main.py |
-| 37 | [Implicit Behavorial Cloning](https://implicitbc.github.io/) (DFO+MCMC) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [policy/ibc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ibc.py)
[model/template/ebm](https://github.com/opendilab/DI-engine/blob/main/ding/model/template/ebm.py) | python3 d4rl_ibc_main.py -s 0 -c pen_human_ibc_mcmc_config.py |
-| 38 | [BCO](https://arxiv.org/pdf/1805.01954.pdf) | data:image/s3,"s3://crabby-images/692d5/692d51a794dbce8d152e95cefed0bc2ee491cf36" alt="IL" | [entry/bco](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_bco.py) | python3 -u cartpole_bco_config.py |
-| 39 | [HER](https://arxiv.org/pdf/1707.01495.pdf) | data:image/s3,"s3://crabby-images/4d57b/4d57b9b3bc6b94d05b2333e91494cc60c9ccf6db" alt="exp" | [HER doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/her.html)
[reward_model/her](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/her_reward_model.py) | python3 -u bitflip_her_dqn.py |
-| 40 | [RND](https://arxiv.org/abs/1810.12894) | data:image/s3,"s3://crabby-images/4d57b/4d57b9b3bc6b94d05b2333e91494cc60c9ccf6db" alt="exp" | [RND doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/rnd.html)
[reward_model/rnd](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/rnd_reward_model.py) | python3 -u cartpole_rnd_onppo_config.py |
-| 41 | [ICM](https://arxiv.org/pdf/1705.05363.pdf) | data:image/s3,"s3://crabby-images/4d57b/4d57b9b3bc6b94d05b2333e91494cc60c9ccf6db" alt="exp" | [ICM doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/icm.html)
[ICM中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/icm_zh.html)
[reward_model/icm](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/icm_reward_model.py) | python3 -u cartpole_ppo_icm_config.py |
-| 42 | [CQL](https://arxiv.org/pdf/2006.04779.pdf) | data:image/s3,"s3://crabby-images/83bf9/83bf94f7fdd42ad1a24cc57a5f3430c761d2d2c7" alt="offline" | [CQL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/cql.html)
[policy/cql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/cql.py) | python3 -u d4rl_cql_main.py |
-| 43 | [TD3BC](https://arxiv.org/pdf/2106.06860.pdf) | data:image/s3,"s3://crabby-images/83bf9/83bf94f7fdd42ad1a24cc57a5f3430c761d2d2c7" alt="offline" | [TD3BC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3_bc.html)
[policy/td3_bc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3_bc.py) | python3 -u d4rl_td3_bc_main.py |
-| 44 | [Decision Transformer](https://arxiv.org/pdf/2106.01345.pdf) | data:image/s3,"s3://crabby-images/83bf9/83bf94f7fdd42ad1a24cc57a5f3430c761d2d2c7" alt="offline" | [policy/dt](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dt.py) | python3 -u d4rl_dt_mujoco.py |
-| 45 | [EDAC](https://arxiv.org/pdf/2110.01548.pdf) | data:image/s3,"s3://crabby-images/83bf9/83bf94f7fdd42ad1a24cc57a5f3430c761d2d2c7" alt="offline" | [EDAC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/edac.html)
[policy/edac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/edac.py) | python3 -u d4rl_edac_main.py |
-| 46 | [QGPO](https://arxiv.org/pdf/2304.12824.pdf) | data:image/s3,"s3://crabby-images/83bf9/83bf94f7fdd42ad1a24cc57a5f3430c761d2d2c7" alt="offline" | [QGPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qgpo.html)
[policy/qgpo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qgpo.py) | python3 -u ding/example/qgpo.py |
-| 47 | MBSAC([SAC](https://arxiv.org/abs/1801.01290)+[MVE](https://arxiv.org/abs/1803.00101)+[SVG](https://arxiv.org/abs/1510.09142)) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous"data:image/s3,"s3://crabby-images/af7cc/af7cc36ebdf073538e32d96b8dc841939a1441f1" alt="mbrl" | [policy/mbpolicy/mbsac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mbpolicy/mbsac.py) | python3 -u pendulum_mbsac_mbpo_config.py \ python3 -u pendulum_mbsac_ddppo_config.py |
-| 48 | STEVESAC([SAC](https://arxiv.org/abs/1801.01290)+[STEVE](https://arxiv.org/abs/1807.01675)+[SVG](https://arxiv.org/abs/1510.09142)) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous"data:image/s3,"s3://crabby-images/af7cc/af7cc36ebdf073538e32d96b8dc841939a1441f1" alt="mbrl" | [policy/mbpolicy/mbsac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mbpolicy/mbsac.py) | python3 -u pendulum_stevesac_mbpo_config.py |
-| 49 | [MBPO](https://arxiv.org/pdf/1906.08253.pdf) | data:image/s3,"s3://crabby-images/af7cc/af7cc36ebdf073538e32d96b8dc841939a1441f1" alt="mbrl" | [MBPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/mbpo.html)
[world_model/mbpo](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/mbpo.py) | python3 -u pendulum_sac_mbpo_config.py |
-| 50 | [DDPPO](https://openreview.net/forum?id=rzvOQrnclO0) | data:image/s3,"s3://crabby-images/af7cc/af7cc36ebdf073538e32d96b8dc841939a1441f1" alt="mbrl" | [world_model/ddppo](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/ddppo.py) | python3 -u pendulum_mbsac_ddppo_config.py |
-| 51 | [DreamerV3](https://arxiv.org/pdf/2301.04104.pdf) | data:image/s3,"s3://crabby-images/af7cc/af7cc36ebdf073538e32d96b8dc841939a1441f1" alt="mbrl" | [world_model/dreamerv3](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/dreamerv3.py) | python3 -u cartpole_balance_dreamer_config.py |
-| 52 | [PER](https://arxiv.org/pdf/1511.05952.pdf) | data:image/s3,"s3://crabby-images/885c8/885c891c00587437a844f9733605bc1604127a34" alt="other" | [worker/replay_buffer](https://github.com/opendilab/DI-engine/blob/main/ding/worker/replay_buffer/advanced_buffer.py) | `rainbow demo` |
-| 53 | [GAE](https://arxiv.org/pdf/1506.02438.pdf) | data:image/s3,"s3://crabby-images/885c8/885c891c00587437a844f9733605bc1604127a34" alt="other" | [rl_utils/gae](https://github.com/opendilab/DI-engine/blob/main/ding/rl_utils/gae.py) | `ppo demo` |
-| 54 | [ST-DIM](https://arxiv.org/pdf/1906.08226.pdf) | data:image/s3,"s3://crabby-images/885c8/885c891c00587437a844f9733605bc1604127a34" alt="other" | [torch_utils/loss/contrastive_loss](https://github.com/opendilab/DI-engine/blob/main/ding/torch_utils/loss/contrastive_loss.py) | ding -m serial -c cartpole_dqn_stdim_config.py -s 0 |
-| 55 | [PLR](https://arxiv.org/pdf/2010.03934.pdf) | data:image/s3,"s3://crabby-images/885c8/885c891c00587437a844f9733605bc1604127a34" alt="other" | [PLR doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/plr.html)
[data/level_replay/level_sampler](https://github.com/opendilab/DI-engine/blob/main/ding/data/level_replay/level_sampler.py) | python3 -u bigfish_plr_config.py -s 0 |
-| 56 | [PCGrad](https://arxiv.org/pdf/2001.06782.pdf) | data:image/s3,"s3://crabby-images/885c8/885c891c00587437a844f9733605bc1604127a34" alt="other" | [torch_utils/optimizer_helper/PCGrad](https://github.com/opendilab/DI-engine/blob/main/ding/data/torch_utils/optimizer_helper.py) | python3 -u multi_mnist_pcgrad_main.py -s 0 |
-
### Environment Versatility
+
(Click to Collapse)
-| No | Environment | Label | Visualization | Code and Doc Links |
-| :--: | :--------------------------------------: | :---------------------------------: | :--------------------------------:|:---------------------------------------------------------: |
-| 1 | [Atari](https://github.com/openai/gym/tree/master/gym/envs/atari) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/d3269/d3269bdd5bce8df2bd154932d042767376bb4b19" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/atari/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/atari.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/atari_zh.html) |
-| 2 | [box2d/bipedalwalker](https://github.com/openai/gym/tree/master/gym/envs/box2d) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/f21a1/f21a1994a109bc5bf6f5e3ef8c83a2a674de06d3" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/bipedalwalker/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/bipedalwalker.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bipedalwalker_zh.html) |
-| 3 | [box2d/lunarlander](https://github.com/openai/gym/tree/master/gym/envs/box2d) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/3652c/3652c6d082009c39adaa15592f6eaa26e3eb6121" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/lunarlander/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/lunarlander.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/lunarlander_zh.html) |
-| 4 | [classic_control/cartpole](https://github.com/openai/gym/tree/master/gym/envs/classic_control) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/3fd97/3fd9784c8bb0144d76bc06111a24381f744217f4" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/cartpole/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/cartpole.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/cartpole_zh.html) |
-| 5 | [classic_control/pendulum](https://github.com/openai/gym/tree/master/gym/envs/classic_control) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/aa356/aa356b0ab61c283d165c68fb4e131e40f2ad9bdd" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/pendulum/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/pendulum.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pendulum_zh.html) |
-| 6 | [competitive_rl](https://github.com/cuhkrlcourse/competitive-rl) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" data:image/s3,"s3://crabby-images/f30c4/f30c4cf341f552a234d46c507235e4af650cf5e3" alt="selfplay" | data:image/s3,"s3://crabby-images/1ea73/1ea73a865b2c4feff62ee9c91c7e22f99b811449" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.classic_control)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/competitive_rl_zh.html) |
-| 7 | [gfootball](https://github.com/google-research/football) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"data:image/s3,"s3://crabby-images/52a12/52a12806e027d22cd9fdc9e72ff9242f881920d8" alt="sparse"data:image/s3,"s3://crabby-images/f30c4/f30c4cf341f552a234d46c507235e4af650cf5e3" alt="selfplay" | data:image/s3,"s3://crabby-images/88a2f/88a2fbd974e44eb81745b499a0cc64b5a9c837e8" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.gfootball/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gfootball.html)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gfootball_zh.html) |
-| 8 | [minigrid](https://github.com/maximecb/gym-minigrid) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"data:image/s3,"s3://crabby-images/52a12/52a12806e027d22cd9fdc9e72ff9242f881920d8" alt="sparse" | data:image/s3,"s3://crabby-images/7aa65/7aa6521d7dec07fbba5b71ee52aded7070565085" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/minigrid/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/minigrid.html)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/minigrid_zh.html) |
-| 9 | [MuJoCo](https://github.com/openai/gym/tree/master/gym/envs/mujoco) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/392e7/392e7c609e18e030706c188c30c971330a0fa572" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/majoco/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/mujoco.html)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/mujoco_zh.html) |
-| 10 | [PettingZoo](https://github.com/Farama-Foundation/PettingZoo) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="marl" | data:image/s3,"s3://crabby-images/c4573/c4573800a94561df130847a517368c6b0ba7f654" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/petting_zoo/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/pettingzoo.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pettingzoo_zh.html) |
-| 11 | [overcooked](https://github.com/HumanCompatibleAI/overcooked-demo) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="marl" | data:image/s3,"s3://crabby-images/f8a68/f8a6846eeddb296a48943e4511f3b5d30d52846f" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/overcooded/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/overcooked.html) |
-| 12 | [procgen](https://github.com/openai/procgen) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/ec10d/ec10dd46b3d3d99306fd36f6ef89a58acea878d5" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/procgen)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/procgen.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/procgen_zh.html) |
-| 13 | [pybullet](https://github.com/benelot/pybullet-gym) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/f760c/f760c7ecc992ab773a3e1fe37a0b73ea97377790" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pybullet/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pybullet_zh.html) |
-| 14 | [smac](https://github.com/oxwhirl/smac) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="marl"data:image/s3,"s3://crabby-images/f30c4/f30c4cf341f552a234d46c507235e4af650cf5e3" alt="selfplay"data:image/s3,"s3://crabby-images/52a12/52a12806e027d22cd9fdc9e72ff9242f881920d8" alt="sparse" | data:image/s3,"s3://crabby-images/cdffd/cdffd511e07a1c9d90a128a03e7126a47ddbdc64" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/smac/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/smac.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/smac_zh.html) |
-| 15 | [d4rl](https://github.com/rail-berkeley/d4rl) | data:image/s3,"s3://crabby-images/83bf9/83bf94f7fdd42ad1a24cc57a5f3430c761d2d2c7" alt="offline" | data:image/s3,"s3://crabby-images/a6b3a/a6b3a9eb9ead1c4e4d5cd32c020bf96f2f935176" alt="ori" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/d4rl)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/d4rl_zh.html) |
-| 16 | league_demo | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" data:image/s3,"s3://crabby-images/f30c4/f30c4cf341f552a234d46c507235e4af650cf5e3" alt="selfplay" | data:image/s3,"s3://crabby-images/75b19/75b19942b93e969cdd7288dd817bf2ce35a06438" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/league_demo/envs) |
-| 17 | pomdp atari | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pomdp/envs) |
-| 18 | [bsuite](https://github.com/deepmind/bsuite) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/cc40c/cc40c7eeb0167cb7498115f537f683d687eff044" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bsuite/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs//bsuite.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bsuite_zh.html) |
-| 19 | [ImageNet](https://www.image-net.org/) | data:image/s3,"s3://crabby-images/c4a66/c4a664c07dccde61d87ef31a5e557a776115c1ca" alt="IL" | data:image/s3,"s3://crabby-images/18521/18521ce11f96c062da5b3581a43a455f39e83df8" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/image_classification)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/image_cls_zh.html) |
-| 20 | [slime_volleyball](https://github.com/hardmaru/slimevolleygym) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"data:image/s3,"s3://crabby-images/f30c4/f30c4cf341f552a234d46c507235e4af650cf5e3" alt="selfplay" | data:image/s3,"s3://crabby-images/9a728/9a728b9f991ec279fe0e47467a3b9b6603bfc508" alt="ori" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/slime_volley)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/slime_volleyball.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/slime_volleyball_zh.html) |
-| 21 | [gym_hybrid](https://github.com/thomashirtz/gym-hybrid) | data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | data:image/s3,"s3://crabby-images/5c726/5c726dde8e534b9306dd59091cc456b17ea192e8" alt="ori" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_hybrid)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gym_hybrid.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_hybrid_zh.html) |
-| 22 | [GoBigger](https://github.com/opendilab/GoBigger) | data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid"data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="marl"data:image/s3,"s3://crabby-images/f30c4/f30c4cf341f552a234d46c507235e4af650cf5e3" alt="selfplay" | data:image/s3,"s3://crabby-images/b9e12/b9e12eec2c69ebdd8e1306652d4a5fb8065b7340" alt="ori" | [dizoo link](https://github.com/opendilab/GoBigger-Challenge-2021/tree/main/di_baseline)
[env tutorial](https://gobigger.readthedocs.io/en/latest/index.html)
[环境指南](https://gobigger.readthedocs.io/zh_CN/latest/) |
-| 23 | [gym_soccer](https://github.com/openai/gym-soccer) | data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | data:image/s3,"s3://crabby-images/15c3c/15c3c3c42927250818b5e8a4c369e000965cd14a" alt="ori" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_soccer)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_soccer_zh.html) |
-| 24 |[multiagent_mujoco](https://github.com/schroederdewitt/multiagent_mujoco) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="marl" | data:image/s3,"s3://crabby-images/392e7/392e7c609e18e030706c188c30c971330a0fa572" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/multiagent_mujoco/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/mujoco_zh.html) |
-| 25 |bitflip | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" data:image/s3,"s3://crabby-images/52a12/52a12806e027d22cd9fdc9e72ff9242f881920d8" alt="sparse" | data:image/s3,"s3://crabby-images/9b39b/9b39bce2bdcb2031c842494d2b98217dfbd567b5" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bitflip/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bitflip_zh.html) |
-| 26 |[sokoban](https://github.com/mpSchrader/gym-sokoban) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/9ecae/9ecaea7bd90448c3f7b8b4a2d448bdeab44972ec" alt="Game 2" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/sokoban/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/sokoban.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/sokoban_zh.html) |
-| 27 |[gym_anytrading](https://github.com/AminHP/gym-anytrading) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/647e2/647e234b27b06f960836f228c388ceb3eae0d9a8" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_anytrading)
[env tutorial](https://github.com/opendilab/DI-engine/blob/main/dizoo/gym_anytrading/envs/README.md) |
-| 28 |[mario](https://github.com/Kautenja/gym-super-mario-bros) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/cc5de/cc5de831fc5e0a2799d3e4c3bea1d54c2615e114" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/mario)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gym_super_mario_bros.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_super_mario_bros_zh.html) |
-| 29 |[dmc2gym](https://github.com/denisyarats/dmc2gym) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/53af1/53af16c35a80a217372d2cdeb53522dc8bae67a9" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/dmc2gym)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/dmc2gym.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/dmc2gym_zh.html) |
-| 30 |[evogym](https://github.com/EvolutionGym/evogym) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/8ca24/8ca24dadc56ed7b329d7fae3416fb58c0b158dc8" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/evogym/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/evogym.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/Evogym_zh.html) |
-| 31 |[gym-pybullet-drones](https://github.com/utiasDSL/gym-pybullet-drones) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/68f5f/68f5fecb86912969d191e207709569b7ef752aae" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_pybullet_drones/envs)
环境指南 |
-| 32 |[beergame](https://github.com/OptMLGroup/DeepBeerInventory-RL) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/b955e/b955e00766fea50b924be6cecc3fbb9f64ec389e" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/beergame/envs)
环境指南 |
-| 33 |[classic_control/acrobot](https://github.com/openai/gym/tree/master/gym/envs/classic_control) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/5ddc2/5ddc225328939cc3b565055ade64f19ecb8a1b2f" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/acrobot/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/acrobot_zh.html) |
-| 34 |[box2d/car_racing](https://github.com/openai/gym/blob/master/gym/envs/box2d/car_racing.py) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"
data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/c4815/c4815f4f67e95f23687a3eee0e4d4e9fc8b4553d" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/carracing/envs)
环境指南 |
-| 35 |[metadrive](https://github.com/metadriverse/metadrive) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/47d65/47d6556c3eaeaa3f9195c81bcac7e80b839f183c" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/metadrive/env)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/metadrive_zh.html) |
-| 36 |[cliffwalking](https://github.com/openai/gym/blob/master/gym/envs/toy_text/cliffwalking.py) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/50a50/50a503523bea9dc255a215da0988b1ba49a7d951" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/cliffwalking/envs)
env tutorial
环境指南 |
-| 37 | [tabmwp](https://promptpg.github.io/explore.html) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/4ab03/4ab031e91104611a76e50d625498e4677192e68f" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/tabmwp)
env tutorial
环境指南|
+
+| No | Environment | Label | Visualization | Code and Doc Links |
+| :-: | :--------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| 1 | [Atari](https://github.com/openai/gym/tree/master/gym/envs/atari) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/d3269/d3269bdd5bce8df2bd154932d042767376bb4b19" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/atari/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/atari.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/atari_zh.html) |
+| 2 | [box2d/bipedalwalker](https://github.com/openai/gym/tree/master/gym/envs/box2d) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/f21a1/f21a1994a109bc5bf6f5e3ef8c83a2a674de06d3" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/bipedalwalker/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/bipedalwalker.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bipedalwalker_zh.html) |
+| 3 | [box2d/lunarlander](https://github.com/openai/gym/tree/master/gym/envs/box2d) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/3652c/3652c6d082009c39adaa15592f6eaa26e3eb6121" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/lunarlander/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/lunarlander.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/lunarlander_zh.html) |
+| 4 | [classic_control/cartpole](https://github.com/openai/gym/tree/master/gym/envs/classic_control) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/3fd97/3fd9784c8bb0144d76bc06111a24381f744217f4" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/cartpole/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/cartpole.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/cartpole_zh.html) |
+| 5 | [classic_control/pendulum](https://github.com/openai/gym/tree/master/gym/envs/classic_control) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/aa356/aa356b0ab61c283d165c68fb4e131e40f2ad9bdd" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/pendulum/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/pendulum.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pendulum_zh.html) |
+| 6 | [competitive_rl](https://github.com/cuhkrlcourse/competitive-rl) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" data:image/s3,"s3://crabby-images/f30c4/f30c4cf341f552a234d46c507235e4af650cf5e3" alt="selfplay" | data:image/s3,"s3://crabby-images/1ea73/1ea73a865b2c4feff62ee9c91c7e22f99b811449" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.classic_control)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/competitive_rl_zh.html) |
+| 7 | [gfootball](https://github.com/google-research/football) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"data:image/s3,"s3://crabby-images/52a12/52a12806e027d22cd9fdc9e72ff9242f881920d8" alt="sparse"data:image/s3,"s3://crabby-images/f30c4/f30c4cf341f552a234d46c507235e4af650cf5e3" alt="selfplay" | data:image/s3,"s3://crabby-images/88a2f/88a2fbd974e44eb81745b499a0cc64b5a9c837e8" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.gfootball/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gfootball.html)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gfootball_zh.html) |
+| 8 | [minigrid](https://github.com/maximecb/gym-minigrid) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"data:image/s3,"s3://crabby-images/52a12/52a12806e027d22cd9fdc9e72ff9242f881920d8" alt="sparse" | data:image/s3,"s3://crabby-images/7aa65/7aa6521d7dec07fbba5b71ee52aded7070565085" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/minigrid/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/minigrid.html)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/minigrid_zh.html) |
+| 9 | [MuJoCo](https://github.com/openai/gym/tree/master/gym/envs/mujoco) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/392e7/392e7c609e18e030706c188c30c971330a0fa572" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/majoco/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/mujoco.html)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/mujoco_zh.html) |
+| 10 | [PettingZoo](https://github.com/Farama-Foundation/PettingZoo) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="marl" | data:image/s3,"s3://crabby-images/c4573/c4573800a94561df130847a517368c6b0ba7f654" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/petting_zoo/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/pettingzoo.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pettingzoo_zh.html) |
+| 11 | [overcooked](https://github.com/HumanCompatibleAI/overcooked-demo) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="marl" | data:image/s3,"s3://crabby-images/f8a68/f8a6846eeddb296a48943e4511f3b5d30d52846f" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/overcooded/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/overcooked.html) |
+| 12 | [procgen](https://github.com/openai/procgen) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/ec10d/ec10dd46b3d3d99306fd36f6ef89a58acea878d5" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/procgen)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/procgen.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/procgen_zh.html) |
+| 13 | [pybullet](https://github.com/benelot/pybullet-gym) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/f760c/f760c7ecc992ab773a3e1fe37a0b73ea97377790" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pybullet/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pybullet_zh.html) |
+| 14 | [smac](https://github.com/oxwhirl/smac) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="marl"data:image/s3,"s3://crabby-images/f30c4/f30c4cf341f552a234d46c507235e4af650cf5e3" alt="selfplay"data:image/s3,"s3://crabby-images/52a12/52a12806e027d22cd9fdc9e72ff9242f881920d8" alt="sparse" | data:image/s3,"s3://crabby-images/cdffd/cdffd511e07a1c9d90a128a03e7126a47ddbdc64" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/smac/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/smac.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/smac_zh.html) |
+| 15 | [d4rl](https://github.com/rail-berkeley/d4rl) | data:image/s3,"s3://crabby-images/83bf9/83bf94f7fdd42ad1a24cc57a5f3430c761d2d2c7" alt="offline" | data:image/s3,"s3://crabby-images/a6b3a/a6b3a9eb9ead1c4e4d5cd32c020bf96f2f935176" alt="ori" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/d4rl)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/d4rl_zh.html) |
+| 16 | league_demo | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" data:image/s3,"s3://crabby-images/f30c4/f30c4cf341f552a234d46c507235e4af650cf5e3" alt="selfplay" | data:image/s3,"s3://crabby-images/75b19/75b19942b93e969cdd7288dd817bf2ce35a06438" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/league_demo/envs) |
+| 17 | pomdp atari | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pomdp/envs) |
+| 18 | [bsuite](https://github.com/deepmind/bsuite) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/cc40c/cc40c7eeb0167cb7498115f537f683d687eff044" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bsuite/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs//bsuite.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bsuite_zh.html) |
+| 19 | [ImageNet](https://www.image-net.org/) | data:image/s3,"s3://crabby-images/c4a66/c4a664c07dccde61d87ef31a5e557a776115c1ca" alt="IL" | data:image/s3,"s3://crabby-images/18521/18521ce11f96c062da5b3581a43a455f39e83df8" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/image_classification)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/image_cls_zh.html) |
+| 20 | [slime_volleyball](https://github.com/hardmaru/slimevolleygym) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"data:image/s3,"s3://crabby-images/f30c4/f30c4cf341f552a234d46c507235e4af650cf5e3" alt="selfplay" | data:image/s3,"s3://crabby-images/9a728/9a728b9f991ec279fe0e47467a3b9b6603bfc508" alt="ori" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/slime_volley)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/slime_volleyball.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/slime_volleyball_zh.html) |
+| 21 | [gym_hybrid](https://github.com/thomashirtz/gym-hybrid) | data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | data:image/s3,"s3://crabby-images/5c726/5c726dde8e534b9306dd59091cc456b17ea192e8" alt="ori" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_hybrid)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gym_hybrid.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_hybrid_zh.html) |
+| 22 | [GoBigger](https://github.com/opendilab/GoBigger) | data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid"data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="marl"data:image/s3,"s3://crabby-images/f30c4/f30c4cf341f552a234d46c507235e4af650cf5e3" alt="selfplay" | data:image/s3,"s3://crabby-images/b9e12/b9e12eec2c69ebdd8e1306652d4a5fb8065b7340" alt="ori" | [dizoo link](https://github.com/opendilab/GoBigger-Challenge-2021/tree/main/di_baseline)
[env tutorial](https://gobigger.readthedocs.io/en/latest/index.html)
[环境指南](https://gobigger.readthedocs.io/zh_CN/latest/) |
+| 23 | [gym_soccer](https://github.com/openai/gym-soccer) | data:image/s3,"s3://crabby-images/61fc0/61fc06f476d278b41d6356354247e7cbc8f7c753" alt="hybrid" | data:image/s3,"s3://crabby-images/15c3c/15c3c3c42927250818b5e8a4c369e000965cd14a" alt="ori" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_soccer)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_soccer_zh.html) |
+| 24 | [multiagent_mujoco](https://github.com/schroederdewitt/multiagent_mujoco) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" data:image/s3,"s3://crabby-images/50ff6/50ff69c51cf9d5b8a0b9cfe5f808673f3fc7922d" alt="marl" | data:image/s3,"s3://crabby-images/392e7/392e7c609e18e030706c188c30c971330a0fa572" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/multiagent_mujoco/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/mujoco_zh.html) |
+| 25 | bitflip | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" data:image/s3,"s3://crabby-images/52a12/52a12806e027d22cd9fdc9e72ff9242f881920d8" alt="sparse" | data:image/s3,"s3://crabby-images/9b39b/9b39bce2bdcb2031c842494d2b98217dfbd567b5" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bitflip/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bitflip_zh.html) |
+| 26 | [sokoban](https://github.com/mpSchrader/gym-sokoban) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/9ecae/9ecaea7bd90448c3f7b8b4a2d448bdeab44972ec" alt="Game 2" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/sokoban/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/sokoban.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/sokoban_zh.html) |
+| 27 | [gym_anytrading](https://github.com/AminHP/gym-anytrading) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/647e2/647e234b27b06f960836f228c388ceb3eae0d9a8" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_anytrading)
[env tutorial](https://github.com/opendilab/DI-engine/blob/main/dizoo/gym_anytrading/envs/README.md) |
+| 28 | [mario](https://github.com/Kautenja/gym-super-mario-bros) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/cc5de/cc5de831fc5e0a2799d3e4c3bea1d54c2615e114" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/mario)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gym_super_mario_bros.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_super_mario_bros_zh.html) |
+| 29 | [dmc2gym](https://github.com/denisyarats/dmc2gym) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/53af1/53af16c35a80a217372d2cdeb53522dc8bae67a9" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/dmc2gym)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/dmc2gym.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/dmc2gym_zh.html) |
+| 30 | [evogym](https://github.com/EvolutionGym/evogym) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/8ca24/8ca24dadc56ed7b329d7fae3416fb58c0b158dc8" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/evogym/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/evogym.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/Evogym_zh.html) |
+| 31 | [gym-pybullet-drones](https://github.com/utiasDSL/gym-pybullet-drones) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/68f5f/68f5fecb86912969d191e207709569b7ef752aae" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_pybullet_drones/envs)
环境指南 |
+| 32 | [beergame](https://github.com/OptMLGroup/DeepBeerInventory-RL) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/b955e/b955e00766fea50b924be6cecc3fbb9f64ec389e" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/beergame/envs)
环境指南 |
+| 33 | [classic_control/acrobot](https://github.com/openai/gym/tree/master/gym/envs/classic_control) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/5ddc2/5ddc225328939cc3b565055ade64f19ecb8a1b2f" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/acrobot/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/acrobot_zh.html) |
+| 34 | [box2d/car_racing](https://github.com/openai/gym/blob/master/gym/envs/box2d/car_racing.py) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete"
data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/c4815/c4815f4f67e95f23687a3eee0e4d4e9fc8b4553d" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/carracing/envs)
环境指南 |
+| 35 | [metadrive](https://github.com/metadriverse/metadrive) | data:image/s3,"s3://crabby-images/5d8b2/5d8b2b2f74d57377bce61d00f403b16cc4f5a30a" alt="continuous" | data:image/s3,"s3://crabby-images/47d65/47d6556c3eaeaa3f9195c81bcac7e80b839f183c" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/metadrive/env)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/metadrive_zh.html) |
+| 36 | [cliffwalking](https://github.com/openai/gym/blob/master/gym/envs/toy_text/cliffwalking.py) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/50a50/50a503523bea9dc255a215da0988b1ba49a7d951" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/cliffwalking/envs)
env tutorial
环境指南 |
+| 37 | [tabmwp](https://promptpg.github.io/explore.html) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/4ab03/4ab031e91104611a76e50d625498e4677192e68f" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/tabmwp)
env tutorial
环境指南 |
+| 38 | [frozen_lake](https://gymnasium.farama.org/environments/toy_text/frozen_lake) | data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" | data:image/s3,"s3://crabby-images/f492b/f492b4a1fefd0e2e8899ee47f13df88f2086faaa" alt="original" | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/frozen_lake)
env tutorial
环境指南 |
+
data:image/s3,"s3://crabby-images/ba9e4/ba9e4a3cb1145fabe53d917a32c0e6bfab82588f" alt="discrete" means discrete action space
@@ -329,111 +334,112 @@ P.S: The `.py` file in `Runnable Demo` can be found in `dizoo`
data:image/s3,"s3://crabby-images/f30c4/f30c4cf341f552a234d46c507235e4af650cf5e3" alt="selfplay" means environment that allows agent VS agent battle
P.S. some enviroments in Atari, such as **MontezumaRevenge**, are also the sparse reward type.
-
+
### General Data Container: TreeTensor
DI-engine utilizes [TreeTensor](https://github.com/opendilab/DI-treetensor) as the basic data container in various components, which is ease of use and consistent across different code modules such as environment definition, data processing and DRL optimization. Here are some concrete code examples:
- TreeTensor can easily extend all the operations of `torch.Tensor` to nested data:
+
(Click for Details)
- ```python
- import treetensor.torch as ttorch
-
-
- # create random tensor
- data = ttorch.randn({'a': (3, 2), 'b': {'c': (3, )}})
- # clone+detach tensor
- data_clone = data.clone().detach()
- # access tree structure like attribute
- a = data.a
- c = data.b.c
- # stack/cat/split
- stacked_data = ttorch.stack([data, data_clone], 0)
- cat_data = ttorch.cat([data, data_clone], 0)
- data, data_clone = ttorch.split(stacked_data, 1)
- # reshape
- data = data.unsqueeze(-1)
- data = data.squeeze(-1)
- flatten_data = data.view(-1)
- # indexing
- data_0 = data[0]
- data_1to2 = data[1:2]
- # execute math calculations
- data = data.sin()
- data.b.c.cos_().clamp_(-1, 1)
- data += data ** 2
- # backward
- data.requires_grad_(True)
- loss = data.arctan().mean()
- loss.backward()
- # print shape
- print(data.shape)
- # result
- #
- # ├── 'a' --> torch.Size([1, 3, 2])
- # └── 'b' -->
- # └── 'c' --> torch.Size([1, 3])
- ```
+ ```python
+ import treetensor.torch as ttorch
+
+
+ # create random tensor
+ data = ttorch.randn({'a': (3, 2), 'b': {'c': (3, )}})
+ # clone+detach tensor
+ data_clone = data.clone().detach()
+ # access tree structure like attribute
+ a = data.a
+ c = data.b.c
+ # stack/cat/split
+ stacked_data = ttorch.stack([data, data_clone], 0)
+ cat_data = ttorch.cat([data, data_clone], 0)
+ data, data_clone = ttorch.split(stacked_data, 1)
+ # reshape
+ data = data.unsqueeze(-1)
+ data = data.squeeze(-1)
+ flatten_data = data.view(-1)
+ # indexing
+ data_0 = data[0]
+ data_1to2 = data[1:2]
+ # execute math calculations
+ data = data.sin()
+ data.b.c.cos_().clamp_(-1, 1)
+ data += data ** 2
+ # backward
+ data.requires_grad_(True)
+ loss = data.arctan().mean()
+ loss.backward()
+ # print shape
+ print(data.shape)
+ # result
+ #
+ # ├── 'a' --> torch.Size([1, 3, 2])
+ # └── 'b' -->
+ # └── 'c' --> torch.Size([1, 3])
+ ```
-
- TreeTensor can make it simple yet effective to implement classic deep reinforcement learning pipeline
+
(Click for Details)
- ```diff
- import torch
- import treetensor.torch as ttorch
-
- B = 4
-
-
- def get_item():
- return {
- 'obs': {
- 'scalar': torch.randn(12),
- 'image': torch.randn(3, 32, 32),
- },
- 'action': torch.randint(0, 10, size=(1,)),
- 'reward': torch.rand(1),
- 'done': False,
- }
-
-
- data = [get_item() for _ in range(B)]
-
-
- # execute `stack` op
- - def stack(data, dim):
- - elem = data[0]
- - if isinstance(elem, torch.Tensor):
- - return torch.stack(data, dim)
- - elif isinstance(elem, dict):
- - return {k: stack([item[k] for item in data], dim) for k in elem.keys()}
- - elif isinstance(elem, bool):
- - return torch.BoolTensor(data)
- - else:
- - raise TypeError("not support elem type: {}".format(type(elem)))
- - stacked_data = stack(data, dim=0)
- + data = [ttorch.tensor(d) for d in data]
- + stacked_data = ttorch.stack(data, dim=0)
-
- # validate
- - assert stacked_data['obs']['image'].shape == (B, 3, 32, 32)
- - assert stacked_data['action'].shape == (B, 1)
- - assert stacked_data['reward'].shape == (B, 1)
- - assert stacked_data['done'].shape == (B,)
- - assert stacked_data['done'].dtype == torch.bool
- + assert stacked_data.obs.image.shape == (B, 3, 32, 32)
- + assert stacked_data.action.shape == (B, 1)
- + assert stacked_data.reward.shape == (B, 1)
- + assert stacked_data.done.shape == (B,)
- + assert stacked_data.done.dtype == torch.bool
- ```
+ ```diff
+ import torch
+ import treetensor.torch as ttorch
+
+ B = 4
+
+
+ def get_item():
+ return {
+ 'obs': {
+ 'scalar': torch.randn(12),
+ 'image': torch.randn(3, 32, 32),
+ },
+ 'action': torch.randint(0, 10, size=(1,)),
+ 'reward': torch.rand(1),
+ 'done': False,
+ }
+
+
+ data = [get_item() for _ in range(B)]
+
+
+ # execute `stack` op
+ - def stack(data, dim):
+ - elem = data[0]
+ - if isinstance(elem, torch.Tensor):
+ - return torch.stack(data, dim)
+ - elif isinstance(elem, dict):
+ - return {k: stack([item[k] for item in data], dim) for k in elem.keys()}
+ - elif isinstance(elem, bool):
+ - return torch.BoolTensor(data)
+ - else:
+ - raise TypeError("not support elem type: {}".format(type(elem)))
+ - stacked_data = stack(data, dim=0)
+ + data = [ttorch.tensor(d) for d in data]
+ + stacked_data = ttorch.stack(data, dim=0)
+
+ # validate
+ - assert stacked_data['obs']['image'].shape == (B, 3, 32, 32)
+ - assert stacked_data['action'].shape == (B, 1)
+ - assert stacked_data['reward'].shape == (B, 1)
+ - assert stacked_data['done'].shape == (B,)
+ - assert stacked_data['done'].dtype == torch.bool
+ + assert stacked_data.obs.image.shape == (B, 3, 32, 32)
+ + assert stacked_data.action.shape == (B, 1)
+ + assert stacked_data.reward.shape == (B, 1)
+ + assert stacked_data.done.shape == (B,)
+ + assert stacked_data.done.dtype == torch.bool
+ ```
@@ -442,8 +448,8 @@ DI-engine utilizes [TreeTensor](https://github.com/opendilab/DI-treetensor) as t
- [File an issue](https://github.com/opendilab/DI-engine/issues/new/choose) on Github
- Open or participate in our [forum](https://github.com/opendilab/DI-engine/discussions)
- Discuss on DI-engine [slack communication channel](https://join.slack.com/t/opendilab/shared_invite/zt-v9tmv4fp-nUBAQEH1_Kuyu_q4plBssQ)
-- Discuss on DI-engine's WeChat group (i.e. add us on WeChat: ding314assist)
-
+- Discuss on DI-engine's WeChat group (i.e. add us on WeChat: ding314assist)
+
- Contact our email (opendilab@pjlab.org.cn)
- Contributes to our future plan [Roadmap](https://github.com/opendilab/DI-engine/issues/548)
@@ -460,8 +466,8 @@ We appreciate all the feedbacks and contributions to improve DI-engine, both alg
[data:image/s3,"s3://crabby-images/b59b1/b59b130e6febd4af261bf2c2bb6dc60b77ab38ce" alt="Forkers repo roster for @opendilab/DI-engine"](https://github.com/opendilab/DI-engine/network/members)
-
## Citation
+
```latex
@misc{ding,
title={DI-engine: OpenDILab Decision Intelligence Engine},
@@ -473,4 +479,5 @@ We appreciate all the feedbacks and contributions to improve DI-engine, both alg
```
## License
+
DI-engine released under the Apache 2.0 license.
diff --git a/ding/example/dqn_frozen_lake.py b/ding/example/dqn_frozen_lake.py
new file mode 100644
index 0000000000..ec4b856339
--- /dev/null
+++ b/ding/example/dqn_frozen_lake.py
@@ -0,0 +1,45 @@
+from ditk import logging
+from ding.model import DQN
+from ding.policy import DQNPolicy
+from ding.envs import DingEnvWrapper, BaseEnvManagerV2
+from ding.data import DequeBuffer
+from ding.config import compile_config
+from ding.framework import task
+from ding.framework.context import OnlineRLContext
+from ding.framework.middleware import OffPolicyLearner, StepCollector, interaction_evaluator, data_pusher, \
+ eps_greedy_handler, CkptSaver, nstep_reward_enhancer, final_ctx_saver
+from ding.utils import set_pkg_seed
+from dizoo.frozen_lake.config.frozen_lake_dqn_config import main_config, create_config
+from dizoo.frozen_lake.envs import FrozenLakeEnv
+
+
+def main():
+ logging.getLogger().setLevel(logging.INFO)
+ main_config.policy.nstep = 5
+ cfg = compile_config(main_config, create_cfg=create_config, auto=True)
+ with task.start(async_mode=False, ctx=OnlineRLContext()):
+ collector_env = BaseEnvManagerV2(
+ env_fn=[lambda: FrozenLakeEnv(cfg=cfg.env) for _ in range(cfg.env.collector_env_num)], cfg=cfg.env.manager
+ )
+ evaluator_env = BaseEnvManagerV2(
+ env_fn=[lambda: FrozenLakeEnv(cfg=cfg.env) for _ in range(cfg.env.evaluator_env_num)], cfg=cfg.env.manager
+ )
+ set_pkg_seed(cfg.seed, use_cuda=cfg.policy.cuda)
+
+ model = DQN(**cfg.policy.model)
+ buffer_ = DequeBuffer(size=cfg.policy.other.replay_buffer.replay_buffer_size)
+ policy = DQNPolicy(cfg.policy, model=model)
+
+ task.use(interaction_evaluator(cfg, policy.eval_mode, evaluator_env))
+ task.use(eps_greedy_handler(cfg))
+ task.use(StepCollector(cfg, policy.collect_mode, collector_env))
+ task.use(nstep_reward_enhancer(cfg))
+ task.use(data_pusher(cfg, buffer_))
+ task.use(OffPolicyLearner(cfg, policy.learn_mode, buffer_))
+ task.use(CkptSaver(policy, cfg.exp_name, train_freq=100))
+ task.use(final_ctx_saver(cfg.exp_name))
+ task.run()
+
+
+if __name__ == "__main__":
+ main()
diff --git a/dizoo/frozen_lake/FrozenLake.gif b/dizoo/frozen_lake/FrozenLake.gif
new file mode 100644
index 0000000000..db46a98e39
Binary files /dev/null and b/dizoo/frozen_lake/FrozenLake.gif differ
diff --git a/dizoo/frozen_lake/__init__.py b/dizoo/frozen_lake/__init__.py
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/dizoo/frozen_lake/config/__init__.py b/dizoo/frozen_lake/config/__init__.py
new file mode 100644
index 0000000000..9bec16a088
--- /dev/null
+++ b/dizoo/frozen_lake/config/__init__.py
@@ -0,0 +1 @@
+from .frozen_lake_dqn_config import main_config, create_config
diff --git a/dizoo/frozen_lake/config/frozen_lake_dqn_config.py b/dizoo/frozen_lake/config/frozen_lake_dqn_config.py
new file mode 100644
index 0000000000..84fe0de199
--- /dev/null
+++ b/dizoo/frozen_lake/config/frozen_lake_dqn_config.py
@@ -0,0 +1,64 @@
+from easydict import EasyDict
+
+frozen_lake_dqn_config = dict(
+ exp_name='frozen_lake_seed0',
+ env=dict(
+ collector_env_num=8,
+ evaluator_env_num=5,
+ n_evaluator_episode=10,
+ env_id='FrozenLake-v1',
+ desc=None,
+ map_name="4x4",
+ is_slippery=False,
+ save_replay_gif=False,
+ ),
+ policy=dict(
+ cuda=True,
+ load_path='frozen_lake_seed0/ckpt/ckpt_best.pth.tar',
+ model=dict(
+ obs_shape=16,
+ action_shape=4,
+ encoder_hidden_size_list=[128, 128, 64],
+ dueling=True,
+ ),
+ nstep=3,
+ discount_factor=0.97,
+ learn=dict(
+ update_per_collect=5,
+ batch_size=256,
+ learning_rate=0.001,
+ ),
+ collect=dict(n_sample=10),
+ eval=dict(evaluator=dict(eval_freq=40, )),
+ other=dict(
+ eps=dict(
+ type='exp',
+ start=0.8,
+ end=0.1,
+ decay=10000,
+ ),
+ replay_buffer=dict(replay_buffer_size=20000, ),
+ ),
+ ),
+)
+
+frozen_lake_dqn_config = EasyDict(frozen_lake_dqn_config)
+main_config = frozen_lake_dqn_config
+
+frozen_lake_dqn_create_config = dict(
+ env=dict(
+ type='frozen_lake',
+ import_names=['dizoo.frozen_lake.envs.frozen_lake_env'],
+ ),
+ env_manager=dict(type='base'),
+ policy=dict(type='dqn'),
+ replay_buffer=dict(type='deque', import_names=['ding.data.buffer.deque_buffer_wrapper']),
+)
+
+frozen_lake_dqn_create_config = EasyDict(frozen_lake_dqn_create_config)
+create_config = frozen_lake_dqn_create_config
+
+if __name__ == "__main__":
+ # or you can enter `ding -m serial -c frozen_lake_dqn_config.py -s 0`
+ from ding.entry import serial_pipeline
+ serial_pipeline((main_config, create_config), max_env_step=5000, seed=0)
diff --git a/dizoo/frozen_lake/envs/__init__.py b/dizoo/frozen_lake/envs/__init__.py
new file mode 100644
index 0000000000..dfec345139
--- /dev/null
+++ b/dizoo/frozen_lake/envs/__init__.py
@@ -0,0 +1 @@
+from .frozen_lake_env import FrozenLakeEnv
diff --git a/dizoo/frozen_lake/envs/frozen_lake_env.py b/dizoo/frozen_lake/envs/frozen_lake_env.py
new file mode 100644
index 0000000000..72f179077a
--- /dev/null
+++ b/dizoo/frozen_lake/envs/frozen_lake_env.py
@@ -0,0 +1,144 @@
+from typing import Any, Dict, List, Optional
+import imageio
+import os
+import gymnasium as gymn
+import numpy as np
+from ding.envs import BaseEnv, BaseEnvTimestep
+from ding.torch_utils import to_ndarray
+from ding.utils import ENV_REGISTRY
+
+
+@ENV_REGISTRY.register('frozen_lake')
+class FrozenLakeEnv(BaseEnv):
+
+ def __init__(self, cfg) -> None:
+ self._cfg = cfg
+ assert self._cfg.env_id == "FrozenLake-v1", "yout name is not FrozernLake_v1"
+ self._init_flag = False
+ self._save_replay_bool = False
+ self._save_replay_count = 0
+ self._init_flag = False
+ self._frames = []
+ self._replay_path = False
+
+ def reset(self) -> np.ndarray:
+ if not self._init_flag:
+ if not self._cfg.desc: #specify maps non-preloaded maps
+ self._env = gymn.make(
+ self._cfg.env_id,
+ desc=self._cfg.desc,
+ map_name=self._cfg.map_name,
+ is_slippery=self._cfg.is_slippery,
+ render_mode="rgb_array"
+ )
+ self._observation_space = self._env.observation_space
+ self._action_space = self._env.action_space
+ self._reward_space = gymn.spaces.Box(
+ low=self._env.reward_range[0], high=self._env.reward_range[1], shape=(1, ), dtype=np.float32
+ )
+ self._init_flag = True
+ self._eval_episode_return = 0
+ if hasattr(self, '_seed') and hasattr(self, '_dynamic_seed') and self._dynamic_seed:
+ np_seed = 100 * np.random.randint(1, 1000)
+ self._env_seed = self._seed + np_seed
+ elif hasattr(self, '_seed'):
+ self._env_seed = self._seed
+ if hasattr(self, '_seed'):
+ obs, info = self._env.reset(seed=self._env_seed)
+ else:
+ obs, info = self._env.reset()
+ obs = np.eye(16, dtype=np.float32)[obs - 1]
+ return obs
+
+ def close(self) -> None:
+ if self._init_flag:
+ self._env.close()
+ self._init_flag = False
+
+ def seed(self, seed: int, dynamic_seed: bool = True) -> None:
+ self._seed = seed
+ self._dynamic_seed = dynamic_seed
+ np.random.seed(self._seed)
+
+ def step(self, action: Dict) -> BaseEnvTimestep:
+ obs, rew, terminated, truncated, info = self._env.step(action[0])
+ self._eval_episode_return += rew
+ obs = np.eye(16, dtype=np.float32)[obs - 1]
+ rew = to_ndarray([rew])
+ if self._save_replay_bool:
+ picture = self._env.render()
+ self._frames.append(picture)
+ if terminated or truncated:
+ done = True
+ else:
+ done = False
+ if done:
+ info['eval_episode_return'] = self._eval_episode_return
+ if self._save_replay_bool:
+ assert self._replay_path is not None, "your should have a path"
+ path = os.path.join(
+ self._replay_path, '{}_episode_{}.gif'.format(self._cfg.env_id, self._save_replay_count)
+ )
+ self.frames_to_gif(self._frames, path)
+ self._frames = []
+ self._save_replay_count += 1
+ rew = rew.astype(np.float32)
+ return BaseEnvTimestep(obs, rew, done, info)
+
+ def random_action(self) -> Dict:
+ raw_action = self._env.action_space.sample()
+ my_type = type(self._env.action_space)
+ return [raw_action]
+
+ def __repr__(self) -> str:
+ return "DI-engine Frozen Lake Env"
+
+ @property
+ def observation_space(self) -> gymn.spaces.Space:
+ return self._observation_space
+
+ @property
+ def action_space(self) -> gymn.spaces.Space:
+ return self._action_space
+
+ @property
+ def reward_space(self) -> gymn.spaces.Space:
+ return self._reward_space
+
+ def enable_save_replay(self, replay_path: Optional[str] = None) -> None:
+ if replay_path is None:
+ replay_path = './video'
+ self._replay_path = replay_path
+ self._save_replay_bool = True
+ self._save_replay_count = 0
+ self._frames = []
+
+ @staticmethod
+ def frames_to_gif(frames: List[imageio.core.util.Array], gif_path: str, duration: float = 0.1) -> None:
+ """
+ Convert a list of frames into a GIF.
+ Args:
+ - frames (List[imageio.core.util.Array]): A list of frames, each frame is an image.
+ - gif_path (str): The path to save the GIF file.
+ - duration (float): Duration between each frame in the GIF (seconds).
+
+ Returns:
+ None, the GIF file is saved directly to the specified path.
+ """
+ # Save all frames as temporary image files
+ temp_image_files = []
+ for i, frame in enumerate(frames):
+ temp_image_file = f"frame_{i}.png" # Temporary file name
+ imageio.imwrite(temp_image_file, frame) # Save the frame as a PNG file
+ temp_image_files.append(temp_image_file)
+
+ # Use imageio to convert temporary image files to GIF
+ with imageio.get_writer(gif_path, mode='I', duration=duration) as writer:
+ for temp_image_file in temp_image_files:
+ image = imageio.imread(temp_image_file)
+ writer.append_data(image)
+
+ # Clean up temporary image files
+ for temp_image_file in temp_image_files:
+ os.remove(temp_image_file)
+ print(f"GIF saved as {gif_path}")
diff --git a/dizoo/frozen_lake/envs/test_frozen_lake_env.py b/dizoo/frozen_lake/envs/test_frozen_lake_env.py
new file mode 100644
index 0000000000..c313a264e0
--- /dev/null
+++ b/dizoo/frozen_lake/envs/test_frozen_lake_env.py
@@ -0,0 +1,44 @@
+import numpy as np
+import pytest
+from dizoo.frozen_lake.envs import FrozenLakeEnv
+from easydict import EasyDict
+
+
+@pytest.mark.envtest
+class TestGymHybridEnv:
+
+ def test_my_lake(self):
+ env = FrozenLakeEnv(
+ EasyDict({
+ 'env_id': 'FrozenLake-v1',
+ 'desc': None,
+ 'map_name': "4x4",
+ 'is_slippery': False,
+ })
+ )
+ for _ in range(5):
+ env.seed(314, dynamic_seed=False)
+ assert env._seed == 314
+ obs = env.reset()
+ assert obs.shape == (
+ 16,
+ ), "Considering the one-hot encoding format, your observation should have a dimensionality of 16."
+ for i in range(10):
+ env.enable_save_replay("./video")
+ # Both ``env.random_action()``, and utilizing ``np.random`` as well as action space,
+ # can generate legal random action.
+ if i < 5:
+ random_action = np.array([env.action_space.sample()])
+ else:
+ random_action = env.random_action()
+ timestep = env.step(random_action)
+ print(timestep)
+ assert isinstance(timestep.obs, np.ndarray)
+ assert isinstance(timestep.done, bool)
+ assert timestep.obs.shape == (16, )
+ assert timestep.reward.shape == (1, )
+ assert timestep.reward >= env.reward_space.low
+ assert timestep.reward <= env.reward_space.high
+
+ print(env.observation_space, env.action_space, env.reward_space)
+ env.close()