diff --git a/README.md b/README.md
index 1f0314b471..7e68959a2c 100644
--- a/README.md
+++ b/README.md
@@ -22,8 +22,6 @@
 ![deploy](https://github.com/opendilab/DI-engine/actions/workflows/deploy.yml/badge.svg)
 [![codecov](https://codecov.io/gh/opendilab/DI-engine/branch/main/graph/badge.svg?token=B0Q15JI301)](https://codecov.io/gh/opendilab/DI-engine)
 
-
-
 ![GitHub Org's stars](https://img.shields.io/github/stars/opendilab)
 [![GitHub stars](https://img.shields.io/github/stars/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/stargazers)
 [![GitHub forks](https://img.shields.io/github/forks/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/network)
@@ -37,11 +35,11 @@
 
 Updated on 2024.02.04 DI-engine-v0.5.1
 
-
 ## Introduction to DI-engine
+
 [Documentation](https://di-engine-docs.readthedocs.io/en/latest/) | [中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/) | [Tutorials](https://di-engine-docs.readthedocs.io/en/latest/01_quickstart/index.html) | [Feature](#feature) | [Task & Middleware](https://di-engine-docs.readthedocs.io/en/latest/03_system/index.html) | [TreeTensor](#general-data-container-treetensor) | [Roadmap](https://github.com/opendilab/DI-engine/issues/548)
 
-**DI-engine** is a generalized decision intelligence engine for PyTorch and JAX. 
+**DI-engine** is a generalized decision intelligence engine for PyTorch and JAX.
 
 It provides **python-first** and **asynchronous-native** task and middleware abstractions, and modularly integrates several of the most important decision-making concepts: Env, Policy and Model. Based on the above mechanisms, DI-engine supports **various [deep reinforcement learning](https://di-engine-docs.readthedocs.io/en/latest/10_concepts/index.html) algorithms** with superior performance, high efficiency, well-organized [documentation](https://di-engine-docs.readthedocs.io/en/latest/) and [unittest](https://github.com/opendilab/DI-engine/actions):
 
@@ -89,6 +87,7 @@ It provides **python-first** and **asynchronous-native** task and middleware abs
   - [awesome-diffusion-model-in-rl](https://github.com/opendilab/awesome-diffusion-model-in-rl): A curated list of Diffusion Model in RL resources
   - [awesome-end-to-end-autonomous-driving](https://github.com/opendilab/awesome-end-to-end-autonomous-driving): A curated list of awesome End-to-End Autonomous Driving resources
   - [awesome-driving-behavior-prediction](https://github.com/opendilab/awesome-driving-behavior-prediction): A collection of research papers for Driving Behavior Prediction
+
   </details>
 
 On the low-level end, DI-engine comes with a set of highly re-usable modules, including [RL optimization functions](https://github.com/opendilab/DI-engine/tree/main/ding/rl_utils), [PyTorch utilities](https://github.com/opendilab/DI-engine/tree/main/ding/torch_utils) and [auxiliary tools](https://github.com/opendilab/DI-engine/tree/main/ding/utils).
@@ -104,6 +103,7 @@ BTW, **DI-engine** also has some special **system optimization and design** for
 - [DI-orchestrator](https://github.com/opendilab/DI-orchestrator): RL Kubernetes Custom Resource and Operator Lib
 - [DI-hpc](https://github.com/opendilab/DI-hpc): RL HPC OP Lib
 - [DI-store](https://github.com/opendilab/DI-store): RL Object Store
+
 </details>
 
 Have fun with exploration and exploitation.
@@ -128,11 +128,13 @@ Have fun with exploration and exploitation.
 ## Installation
 
 You can simply install DI-engine from PyPI with the following command:
+
 ```bash
 pip install DI-engine
 ```
 
 If you use Anaconda or Miniconda, you can install DI-engine from conda-forge through the following command:
+
 ```bash
 conda install -c opendilab di-engine
 ```
@@ -155,6 +157,7 @@ And our dockerhub repo can be found [here](https://hub.docker.com/repository/doc
 - cityflow: opendilab/ding:nightly-cityflow
 - evogym: opendilab/ding:nightly-evogym
 - d4rl: opendilab/ding:nightly-d4rl
+
 </details>
 
 The detailed documentation are hosted on [doc](https://di-engine-docs.readthedocs.io/en/latest/) | [中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/).
@@ -175,8 +178,8 @@ The detailed documentation are hosted on [doc](https://di-engine-docs.readthedoc
 
 [新老 pipeline 的异同对比](https://di-engine-docs.readthedocs.io/zh_CN/latest/04_best_practice/diff_in_new_pipeline_zh.html)
 
-
 ## Feature
+
 ### Algorithm Versatility
 
 <details open>
@@ -198,7 +201,6 @@ The detailed documentation are hosted on [doc](https://di-engine-docs.readthedoc
 
 ![offline](https://img.shields.io/badge/-offlineRL-darkblue) &nbsp;[Offiline Reinforcement Learning](https://di-engine-docs.readthedocs.io/en/latest/02_algo/offline_rl.html)｜[离线强化学习](https://di-engine-docs.readthedocs.io/zh_CN/latest/02_algo/offline_rl_zh.html)
 
-
 ![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue) &nbsp;[Model-Based Reinforcement Learning](https://di-engine-docs.readthedocs.io/en/latest/02_algo/model_based_rl.html)｜[基于模型的强化学习](https://di-engine-docs.readthedocs.io/zh_CN/latest/02_algo/model_based_rl_zh.html)
 
 ![other](https://img.shields.io/badge/-other-lightgrey) &nbsp;means other sub-direction algorithms, usually as plugin-in in the whole pipeline
@@ -206,111 +208,114 @@ The detailed documentation are hosted on [doc](https://di-engine-docs.readthedoc
 P.S: The `.py` file in `Runnable Demo` can be found in `dizoo`
 
 
+| No. |                                                              Algorithm                                                              |                                                                                     Label                                                                                     |                                                                                                                                   Doc and Implementation                                                                                                                                   |                                      Runnable Demo                                      |
+| :-: | :---------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------: |
+|  1  |                             [DQN](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)                             |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |             [DQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/dqn.html)<br>[DQN中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/dqn_zh.html)<br>[policy/dqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py)             |     python3 -u cartpole_dqn_main.py / ding -m serial -c cartpole_dqn_config.py -s 0     |
+|  2  |                                             [C51](https://arxiv.org/pdf/1707.06887.pdf)                                             |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                            [C51 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/c51.html)<br>[policy/c51](https://github.com/opendilab/DI-engine/blob/main/ding/policy/c51.py)                                                            |                      ding -m serial -c cartpole_c51_config.py -s 0                      |
+|  3  |                                            [QRDQN](https://arxiv.org/pdf/1710.10044.pdf)                                            |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                        [QRDQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qrdqn.html)<br>[policy/qrdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qrdqn.py)                                                        |                     ding -m serial -c cartpole_qrdqn_config.py -s 0                     |
+|  4  |                                             [IQN](https://arxiv.org/pdf/1806.06923.pdf)                                             |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                            [IQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/iqn.html)<br>[policy/iqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/iqn.py)                                                            |                      ding -m serial -c cartpole_iqn_config.py -s 0                      |
+|  5  |                                             [FQF](https://arxiv.org/pdf/1911.02140.pdf)                                             |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                            [FQF doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/fqf.html)<br>[policy/fqf](https://github.com/opendilab/DI-engine/blob/main/ding/policy/fqf.py)                                                            |                      ding -m serial -c cartpole_fqf_config.py -s 0                      |
+|  6  |                                           [Rainbow](https://arxiv.org/pdf/1710.02298.pdf)                                           |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                    [Rainbow doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/rainbow.html)<br>[policy/rainbow](https://github.com/opendilab/DI-engine/blob/main/ding/policy/rainbow.py)                                                    |                    ding -m serial -c cartpole_rainbow_config.py -s 0                    |
+|  7  |                                             [SQL](https://arxiv.org/pdf/1702.08165.pdf)                                             |                          ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![continuous](https://img.shields.io/badge/-continous-green)                          |                                                            [SQL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sql.html)<br>[policy/sql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sql.py)                                                            |                      ding -m serial -c cartpole_sql_config.py -s 0                      |
+|  8  |                                         [R2D2](https://openreview.net/forum?id=r1lyTjAqYX)                                         |                            ![dist](https://img.shields.io/badge/-distributed-blue)![discrete](https://img.shields.io/badge/-discrete-brightgreen)                            |                                                          [R2D2 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/r2d2.html)<br>[policy/r2d2](https://github.com/opendilab/DI-engine/blob/main/ding/policy/r2d2.py)                                                          |                      ding -m serial -c cartpole_r2d2_config.py -s 0                      |
+|  9  |                   [PG](https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf)                   |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                             [PG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/a2c.html)<br>[policy/pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pg.py)                                                             |                       ding -m serial -c cartpole_pg_config.py -s 0                       |
+| 10 |                                            [PromptPG](https://arxiv.org/abs/2209.14610)                                            |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                                                               [policy/prompt_pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/prompt_pg.py)                                                                                               |                   ding -m serial_onpolicy -c tabmwp_pg_config.py -s 0                   |
+| 11 |                                             [A2C](https://arxiv.org/pdf/1602.01783.pdf)                                             |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                            [A2C doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/a2c.html)<br>[policy/a2c](https://github.com/opendilab/DI-engine/blob/main/ding/policy/a2c.py)                                                            |                      ding -m serial -c cartpole_a2c_config.py -s 0                      |
+| 12 |                        [PPO](https://arxiv.org/abs/1707.06347)/[MAPPO](https://arxiv.org/pdf/2103.01955.pdf)                        | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![continuous](https://img.shields.io/badge/-continous-green)![MARL](https://img.shields.io/badge/-MARL-yellow) |                                                            [PPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ppo.html)<br>[policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py)                                                            | python3 -u cartpole_ppo_main.py / ding -m serial_onpolicy -c cartpole_ppo_config.py -s 0 |
+| 13 |                                             [PPG](https://arxiv.org/pdf/2009.04416.pdf)                                             |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                            [PPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ppg.html)<br>[policy/ppg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppg.py)                                                            |                             python3 -u cartpole_ppg_main.py                             |
+| 14 |                                            [ACER](https://arxiv.org/pdf/1611.01224.pdf)                                            |                          ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![continuous](https://img.shields.io/badge/-continous-green)                          |                                                          [ACER doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/acer.html)<br>[policy/acer](https://github.com/opendilab/DI-engine/blob/main/ding/policy/acer.py)                                                          |                      ding -m serial -c cartpole_acer_config.py -s 0                      |
+| 15 |                                             [IMPALA](https://arxiv.org/abs/1802.01561)                                             |                            ![dist](https://img.shields.io/badge/-distributed-blue)![discrete](https://img.shields.io/badge/-discrete-brightgreen)                            |                                                      [IMPALA doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/impala.html)<br>[policy/impala](https://github.com/opendilab/DI-engine/blob/main/ding/policy/impala.py)                                                      |                     ding -m serial -c cartpole_impala_config.py -s 0                     |
+| 16 |                     [DDPG](https://arxiv.org/pdf/1509.02971.pdf)/[PADDPG](https://arxiv.org/pdf/1511.04143.pdf)                     |                             ![continuous](https://img.shields.io/badge/-continous-green)![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                             |                                                          [DDPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ddpg.html)<br>[policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py)                                                          |                      ding -m serial -c pendulum_ddpg_config.py -s 0                      |
+| 17 |                                             [TD3](https://arxiv.org/pdf/1802.09477.pdf)                                             |                             ![continuous](https://img.shields.io/badge/-continous-green)![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                             |                                                            [TD3 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3.html)<br>[policy/td3](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3.py)                                                            |     python3 -u pendulum_td3_main.py / ding -m serial -c pendulum_td3_config.py -s 0     |
+| 18 |                                            [D4PG](https://arxiv.org/pdf/1804.08617.pdf)                                            |                                                         ![continuous](https://img.shields.io/badge/-continous-green)                                                         |                                                          [D4PG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/d4pg.html)<br>[policy/d4pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/d4pg.py)                                                          |                            python3 -u pendulum_d4pg_config.py                            |
+| 19 |                                           [SAC](https://arxiv.org/abs/1801.01290)/[MASAC]                                           | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![continuous](https://img.shields.io/badge/-continous-green)![MARL](https://img.shields.io/badge/-MARL-yellow) |                                                            [SAC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sac.html)<br>[policy/sac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sac.py)                                                            |                      ding -m serial -c pendulum_sac_config.py -s 0                      |
+| 20 |                                            [PDQN](https://arxiv.org/pdf/1810.06394.pdf)                                            |                                                           ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                                                           |                                                                                                    [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py)                                                                                                    |                     ding -m serial -c gym_hybrid_pdqn_config.py -s 0                     |
+| 21 |                                            [MPDQN](https://arxiv.org/pdf/1905.04388.pdf)                                            |                                                           ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                                                           |                                                                                                    [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py)                                                                                                    |                    ding -m serial -c gym_hybrid_mpdqn_config.py -s 0                    |
+| 22 |                                            [HPPO](https://arxiv.org/pdf/1903.01344.pdf)                                            |                                                           ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                                                           |                                                                                                     [policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py)                                                                                                     |                ding -m serial_onpolicy -c gym_hybrid_hppo_config.py -s 0                |
+| 23 |                                             [BDQ](https://arxiv.org/pdf/1711.08946.pdf)                                             |                                                           ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                                                           |                                                                                                     [policy/bdq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py)                                                                                                     |                             python3 -u hopper_bdq_config.py                             |
+| 24 |                                              [MDQN](https://arxiv.org/abs/2007.14430)                                              |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                                                                    [policy/mdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mdqn.py)                                                                                                    |                            python3 -u asterix_mdqn_config.py                            |
+| 25 |                                            [QMIX](https://arxiv.org/pdf/1803.11485.pdf)                                            |                                                              ![MARL](https://img.shields.io/badge/-MARL-yellow)                                                              |                                                          [QMIX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qmix.html)<br>[policy/qmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qmix.py)                                                          |                     ding -m serial -c smac_3s5z_qmix_config.py -s 0                     |
+| 26 |                                            [COMA](https://arxiv.org/pdf/1705.08926.pdf)                                            |                                                              ![MARL](https://img.shields.io/badge/-MARL-yellow)                                                              |                                                          [COMA doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/coma.html)<br>[policy/coma](https://github.com/opendilab/DI-engine/blob/main/ding/policy/coma.py)                                                          |                     ding -m serial -c smac_3s5z_coma_config.py -s 0                     |
+| 27 |                                              [QTran](https://arxiv.org/abs/1905.05408)                                              |                                                              ![MARL](https://img.shields.io/badge/-MARL-yellow)                                                              |                                                                                                   [policy/qtran](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qtran.py)                                                                                                   |                     ding -m serial -c smac_3s5z_qtran_config.py -s 0                     |
+| 28 |                                              [WQMIX](https://arxiv.org/abs/2006.10800)                                              |                                                              ![MARL](https://img.shields.io/badge/-MARL-yellow)                                                              |                                                        [WQMIX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/wqmix.html)<br>[policy/wqmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/wqmix.py)                                                        |                     ding -m serial -c smac_3s5z_wqmix_config.py -s 0                     |
+| 29 |                                           [CollaQ](https://arxiv.org/pdf/2010.08531.pdf)                                           |                                                              ![MARL](https://img.shields.io/badge/-MARL-yellow)                                                              |                                                      [CollaQ doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/collaq.html)<br>[policy/collaq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/collaq.py)                                                      |                    ding -m serial -c smac_3s5z_collaq_config.py -s 0                    |
+| 30 |                                           [MADDPG](https://arxiv.org/pdf/1706.02275.pdf)                                           |                                                              ![MARL](https://img.shields.io/badge/-MARL-yellow)                                                              |                                                         [MADDPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ddpg.html)<br>[policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py)                                                         |                ding -m serial -c ptz_simple_spread_maddpg_config.py -s 0                |
+| 31 |                                            [GAIL](https://arxiv.org/pdf/1606.03476.pdf)                                            |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |                                               [GAIL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/gail.html)<br>[reward_model/gail](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/gail_irl_model.py)                                               |                 ding -m serial_gail -c cartpole_dqn_gail_config.py -s 0                 |
+| 32 |                                            [SQIL](https://arxiv.org/pdf/1905.11108.pdf)                                            |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |                                                    [SQIL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sqil.html)<br>[entry/sqil](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_sqil.py)                                                    |                   ding -m serial_sqil -c cartpole_sqil_config.py -s 0                   |
+| 33 |                                            [DQFD](https://arxiv.org/pdf/1704.03732.pdf)                                            |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |                                                          [DQFD doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/dqfd.html)<br>[policy/dqfd](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqfd.py)                                                          |                   ding -m serial_dqfd -c cartpole_dqfd_config.py -s 0                   |
+| 34 |                                            [R2D3](https://arxiv.org/pdf/1909.01387.pdf)                                            |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |       [R2D3 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/r2d3.html)<br>[R2D3中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/r2d3_zh.html)<br>[policy/r2d3](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/r2d3_zh.html)       |                        python3 -u pong_r2d3_r2d2expert_config.py                        |
+| 35 |                                    [Guided Cost Learning](https://arxiv.org/pdf/1603.00448.pdf)                                    |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |                      [Guided Cost Learning中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/guided_cost_zh.html)<br>[reward_model/guided_cost](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/guided_cost_reward_model.py)                      |                            python3 lunarlander_gcl_config.py                            |
+| 36 |                                              [TREX](https://arxiv.org/abs/1904.06387)                                              |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |                                             [TREX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/trex.html)<br>[reward_model/trex](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/trex_reward_model.py)                                             |                               python3 mujoco_trex_main.py                               |
+| 37 |                               [Implicit Behavorial Cloning](https://implicitbc.github.io/) (DFO+MCMC)                               |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |                                                  [policy/ibc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ibc.py) <br> [model/template/ebm](https://github.com/opendilab/DI-engine/blob/main/ding/model/template/ebm.py)                                                  |              python3 d4rl_ibc_main.py -s 0 -c pen_human_ibc_mcmc_config.py              |
+| 38 |                                             [BCO](https://arxiv.org/pdf/1805.01954.pdf)                                             |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |                                                                                                [entry/bco](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_bco.py)                                                                                                |                            python3 -u cartpole_bco_config.py                            |
+| 39 |                                             [HER](https://arxiv.org/pdf/1707.01495.pdf)                                             |                                                           ![exp](https://img.shields.io/badge/-exploration-orange)                                                           |                                               [HER doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/her.html)<br>[reward_model/her](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/her_reward_model.py)                                               |                              python3 -u bitflip_her_dqn.py                              |
+| 40 |                                               [RND](https://arxiv.org/abs/1810.12894)                                               |                                                           ![exp](https://img.shields.io/badge/-exploration-orange)                                                           |                                               [RND doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/rnd.html)<br>[reward_model/rnd](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/rnd_reward_model.py)                                               |                         python3 -u cartpole_rnd_onppo_config.py                         |
+| 41 |                                             [ICM](https://arxiv.org/pdf/1705.05363.pdf)                                             |                                                           ![exp](https://img.shields.io/badge/-exploration-orange)                                                           | [ICM doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/icm.html)<br>[ICM中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/icm_zh.html)<br>[reward_model/icm](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/icm_reward_model.py) |                          python3 -u cartpole_ppo_icm_config.py                          |
+| 42 |                                             [CQL](https://arxiv.org/pdf/2006.04779.pdf)                                             |                                                         ![offline](https://img.shields.io/badge/-offlineRL-darkblue)                                                         |                                                            [CQL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/cql.html)<br>[policy/cql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/cql.py)                                                            |                               python3 -u d4rl_cql_main.py                               |
+| 43 |                                            [TD3BC](https://arxiv.org/pdf/2106.06860.pdf)                                            |                                                         ![offline](https://img.shields.io/badge/-offlineRL-darkblue)                                                         |                                                      [TD3BC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3_bc.html)<br>[policy/td3_bc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3_bc.py)                                                      |                              python3 -u d4rl_td3_bc_main.py                              |
+| 44 |                                    [Decision Transformer](https://arxiv.org/pdf/2106.01345.pdf)                                    |                                                         ![offline](https://img.shields.io/badge/-offlineRL-darkblue)                                                         |                                                                                                      [policy/dt](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dt.py)                                                                                                      |                               python3 -u d4rl_dt_mujoco.py                               |
+| 45 |                                            [EDAC](https://arxiv.org/pdf/2110.01548.pdf)                                            |                                                         ![offline](https://img.shields.io/badge/-offlineRL-darkblue)                                                         |                                                          [EDAC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/edac.html)<br>[policy/edac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/edac.py)                                                          |                               python3 -u d4rl_edac_main.py                               |
+| 46 |                                            [QGPO](https://arxiv.org/pdf/2304.12824.pdf)                                            |                                                         ![offline](https://img.shields.io/badge/-offlineRL-darkblue)                                                         |                                                          [QGPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qgpo.html)<br>[policy/qgpo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qgpo.py)                                                          |                             python3 -u ding/example/qgpo.py                             |
+| 47 |   MBSAC([SAC](https://arxiv.org/abs/1801.01290)+[MVE](https://arxiv.org/abs/1803.00101)+[SVG](https://arxiv.org/abs/1510.09142))   |                           ![continuous](https://img.shields.io/badge/-continous-green)![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue)                           |                                                                                          [policy/mbpolicy/mbsac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mbpolicy/mbsac.py)                                                                                          |   python3 -u pendulum_mbsac_mbpo_config.py \ python3 -u pendulum_mbsac_ddppo_config.py   |
+| 48 | STEVESAC([SAC](https://arxiv.org/abs/1801.01290)+[STEVE](https://arxiv.org/abs/1807.01675)+[SVG](https://arxiv.org/abs/1510.09142)) |                           ![continuous](https://img.shields.io/badge/-continous-green)![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue)                           |                                                                                          [policy/mbpolicy/mbsac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mbpolicy/mbsac.py)                                                                                          |                       python3 -u pendulum_stevesac_mbpo_config.py                       |
+| 49 |                                            [MBPO](https://arxiv.org/pdf/1906.08253.pdf)                                            |                                                         ![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue)                                                         |                                                     [MBPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/mbpo.html)<br>[world_model/mbpo](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/mbpo.py)                                                     |                          python3 -u pendulum_sac_mbpo_config.py                          |
+| 50 |                                        [DDPPO](https://openreview.net/forum?id=rzvOQrnclO0)                                        |                                                         ![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue)                                                         |                                                                                              [world_model/ddppo](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/ddppo.py)                                                                                              |                        python3 -u pendulum_mbsac_ddppo_config.py                        |
+| 51 |                                          [DreamerV3](https://arxiv.org/pdf/2301.04104.pdf)                                          |                                                         ![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue)                                                         |                                                                                          [world_model/dreamerv3](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/dreamerv3.py)                                                                                          |                      python3 -u cartpole_balance_dreamer_config.py                      |
+| 52 |                                             [PER](https://arxiv.org/pdf/1511.05952.pdf)                                             |                                                            ![other](https://img.shields.io/badge/-other-lightgrey)                                                            |                                                                                   [worker/replay_buffer](https://github.com/opendilab/DI-engine/blob/main/ding/worker/replay_buffer/advanced_buffer.py)                                                                                   |                                      `rainbow demo`                                      |
+| 53 |                                             [GAE](https://arxiv.org/pdf/1506.02438.pdf)                                             |                                                            ![other](https://img.shields.io/badge/-other-lightgrey)                                                            |                                                                                                   [rl_utils/gae](https://github.com/opendilab/DI-engine/blob/main/ding/rl_utils/gae.py)                                                                                                   |                                        `ppo demo`                                        |
+| 54 |                                           [ST-DIM](https://arxiv.org/pdf/1906.08226.pdf)                                           |                                                            ![other](https://img.shields.io/badge/-other-lightgrey)                                                            |                                                                              [torch_utils/loss/contrastive_loss](https://github.com/opendilab/DI-engine/blob/main/ding/torch_utils/loss/contrastive_loss.py)                                                                              |                   ding -m serial -c cartpole_dqn_stdim_config.py -s 0                   |
+| 55 |                                             [PLR](https://arxiv.org/pdf/2010.03934.pdf)                                             |                                                            ![other](https://img.shields.io/badge/-other-lightgrey)                                                            |                                       [PLR doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/plr.html)<br>[data/level_replay/level_sampler](https://github.com/opendilab/DI-engine/blob/main/ding/data/level_replay/level_sampler.py)                                       |                          python3 -u bigfish_plr_config.py -s 0                          |
+| 56 |                                           [PCGrad](https://arxiv.org/pdf/2001.06782.pdf)                                           |                                                            ![other](https://img.shields.io/badge/-other-lightgrey)                                                            |                                                                             [torch_utils/optimizer_helper/PCGrad](https://github.com/opendilab/DI-engine/blob/main/ding/data/torch_utils/optimizer_helper.py)                                                                             |                        python3 -u multi_mnist_pcgrad_main.py -s 0                        |
 
-|  No.  |                          Algorithm                           |                            Label                             |                        Doc and Implementation                        |                        Runnable Demo                         |
-| :--: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
-|  1   |         [DQN](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [DQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/dqn.html)<br>[DQN中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/dqn_zh.html)<br>[policy/dqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py) | python3 -u cartpole_dqn_main.py / ding -m serial -c cartpole_dqn_config.py -s 0 |
-|  2   |         [C51](https://arxiv.org/pdf/1707.06887.pdf)          | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [C51 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/c51.html)<br>[policy/c51](https://github.com/opendilab/DI-engine/blob/main/ding/policy/c51.py) |        ding -m serial -c cartpole_c51_config.py -s 0         |
-|  3   |         [QRDQN](https://arxiv.org/pdf/1710.10044.pdf)        | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [QRDQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qrdqn.html)<br>[policy/qrdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qrdqn.py) |       ding -m serial -c cartpole_qrdqn_config.py -s 0        |
-|  4   |         [IQN](https://arxiv.org/pdf/1806.06923.pdf)          | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [IQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/iqn.html)<br>[policy/iqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/iqn.py) |        ding -m serial -c cartpole_iqn_config.py -s 0         |
-|  5   |         [FQF](https://arxiv.org/pdf/1911.02140.pdf)          | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [FQF doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/fqf.html)<br>[policy/fqf](https://github.com/opendilab/DI-engine/blob/main/ding/policy/fqf.py) |        ding -m serial -c cartpole_fqf_config.py -s 0         |
-|  6   |         [Rainbow](https://arxiv.org/pdf/1710.02298.pdf)          | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [Rainbow doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/rainbow.html)<br>[policy/rainbow](https://github.com/opendilab/DI-engine/blob/main/ding/policy/rainbow.py) |      ding -m serial -c cartpole_rainbow_config.py -s 0       |
-|  7   |         [SQL](https://arxiv.org/pdf/1702.08165.pdf)          | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![continuous](https://img.shields.io/badge/-continous-green) | [SQL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sql.html)<br>[policy/sql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sql.py) |        ding -m serial -c cartpole_sql_config.py -s 0         |
-|  8   |         [R2D2](https://openreview.net/forum?id=r1lyTjAqYX)      | ![dist](https://img.shields.io/badge/-distributed-blue)![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [R2D2 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/r2d2.html)<br>[policy/r2d2](https://github.com/opendilab/DI-engine/blob/main/ding/policy/r2d2.py) |        ding -m serial -c cartpole_r2d2_config.py -s 0        |
-|  9   |         [PG](https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf)            | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [PG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/a2c.html)<br>[policy/pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pg.py) |        ding -m serial -c cartpole_pg_config.py -s 0         |
-| 10 | [PromptPG](https://arxiv.org/abs/2209.14610) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [policy/prompt_pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/prompt_pg.py) | ding -m serial_onpolicy -c tabmwp_pg_config.py -s 0 |
-|  11  |         [A2C](https://arxiv.org/pdf/1602.01783.pdf)            | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [A2C doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/a2c.html)<br>[policy/a2c](https://github.com/opendilab/DI-engine/blob/main/ding/policy/a2c.py) |        ding -m serial -c cartpole_a2c_config.py -s 0         |
-|  12  |         [PPO](https://arxiv.org/abs/1707.06347)/[MAPPO](https://arxiv.org/pdf/2103.01955.pdf)         | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![continuous](https://img.shields.io/badge/-continous-green)![MARL](https://img.shields.io/badge/-MARL-yellow) | [PPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ppo.html)<br>[policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py) | python3 -u cartpole_ppo_main.py / ding -m serial_onpolicy -c cartpole_ppo_config.py -s 0 |
-|  13  |         [PPG](https://arxiv.org/pdf/2009.04416.pdf)          | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [PPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ppg.html)<br>[policy/ppg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppg.py) |               python3 -u cartpole_ppg_main.py                |
-|  14  |         [ACER](https://arxiv.org/pdf/1611.01224.pdf)         | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![continuous](https://img.shields.io/badge/-continous-green) | [ACER doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/acer.html)<br>[policy/acer](https://github.com/opendilab/DI-engine/blob/main/ding/policy/acer.py) |        ding -m serial -c cartpole_acer_config.py -s 0        |
-|  15  |          [IMPALA](https://arxiv.org/abs/1802.01561)          | ![dist](https://img.shields.io/badge/-distributed-blue)![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [IMPALA doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/impala.html)<br>[policy/impala](https://github.com/opendilab/DI-engine/blob/main/ding/policy/impala.py) |       ding -m serial -c cartpole_impala_config.py -s 0       |
-|  16  |         [DDPG](https://arxiv.org/pdf/1509.02971.pdf)/[PADDPG](https://arxiv.org/pdf/1511.04143.pdf)         | ![continuous](https://img.shields.io/badge/-continous-green)![hybrid](https://img.shields.io/badge/-hybrid-darkgreen) | [DDPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ddpg.html)<br>[policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py) |        ding -m serial -c pendulum_ddpg_config.py -s 0        |
-|  17  |         [TD3](https://arxiv.org/pdf/1802.09477.pdf)          | ![continuous](https://img.shields.io/badge/-continous-green)![hybrid](https://img.shields.io/badge/-hybrid-darkgreen) | [TD3 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3.html)<br>[policy/td3](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3.py) | python3 -u pendulum_td3_main.py / ding -m serial -c pendulum_td3_config.py -s 0 |
-|  18  | [D4PG](https://arxiv.org/pdf/1804.08617.pdf) | ![continuous](https://img.shields.io/badge/-continous-green) | [D4PG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/d4pg.html)<br>[policy/d4pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/d4pg.py) | python3 -u pendulum_d4pg_config.py |
-|  19  |           [SAC](https://arxiv.org/abs/1801.01290)/[MASAC]            | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![continuous](https://img.shields.io/badge/-continous-green)![MARL](https://img.shields.io/badge/-MARL-yellow) | [SAC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sac.html)<br>[policy/sac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sac.py) |        ding -m serial -c pendulum_sac_config.py -s 0         |
-|  20  | [PDQN](https://arxiv.org/pdf/1810.06394.pdf) | ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen) | [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py) | ding -m serial -c gym_hybrid_pdqn_config.py -s 0 |
-|  21  | [MPDQN](https://arxiv.org/pdf/1905.04388.pdf) | ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen) | [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py) | ding -m serial -c gym_hybrid_mpdqn_config.py -s 0 |
-|  22  | [HPPO](https://arxiv.org/pdf/1903.01344.pdf) | ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen) | [policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py) | ding -m serial_onpolicy -c gym_hybrid_hppo_config.py -s 0 |
-|  23  |         [BDQ](https://arxiv.org/pdf/1711.08946.pdf)          |   ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)    | [policy/bdq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py) |        python3 -u hopper_bdq_config.py       |
-|  24  |         [MDQN](https://arxiv.org/abs/2007.14430)          |   ![discrete](https://img.shields.io/badge/-discrete-brightgreen)    | [policy/mdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mdqn.py) |        python3 -u asterix_mdqn_config.py       |
-|  25  |           [QMIX](https://arxiv.org/pdf/1803.11485.pdf)           |      ![MARL](https://img.shields.io/badge/-MARL-yellow)      | [QMIX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qmix.html)<br>[policy/qmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qmix.py) |       ding -m serial -c smac_3s5z_qmix_config.py -s 0        |
-|  26  |         [COMA](https://arxiv.org/pdf/1705.08926.pdf)         |      ![MARL](https://img.shields.io/badge/-MARL-yellow)      | [COMA doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/coma.html)<br>[policy/coma](https://github.com/opendilab/DI-engine/blob/main/ding/policy/coma.py) |       ding -m serial -c smac_3s5z_coma_config.py -s 0        |
-|  27  |          [QTran](https://arxiv.org/abs/1905.05408)           |      ![MARL](https://img.shields.io/badge/-MARL-yellow)      | [policy/qtran](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qtran.py) |       ding -m serial -c smac_3s5z_qtran_config.py -s 0       |
-|  28  |          [WQMIX](https://arxiv.org/abs/2006.10800)           |      ![MARL](https://img.shields.io/badge/-MARL-yellow)      | [WQMIX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/wqmix.html)<br>[policy/wqmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/wqmix.py) |       ding -m serial -c smac_3s5z_wqmix_config.py -s 0       |
-|  29  |        [CollaQ](https://arxiv.org/pdf/2010.08531.pdf)        |      ![MARL](https://img.shields.io/badge/-MARL-yellow)      | [CollaQ doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/collaq.html)<br>[policy/collaq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/collaq.py) |      ding -m serial -c smac_3s5z_collaq_config.py -s 0       |
-|  30  |        [MADDPG](https://arxiv.org/pdf/1706.02275.pdf)        |      ![MARL](https://img.shields.io/badge/-MARL-yellow)      | [MADDPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ddpg.html)<br>[policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py) |      ding -m serial -c ptz_simple_spread_maddpg_config.py -s 0       |
-|  31  |           [GAIL](https://arxiv.org/pdf/1606.03476.pdf)           |        ![IL](https://img.shields.io/badge/-IL-purple)        | [GAIL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/gail.html)<br>[reward_model/gail](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/gail_irl_model.py) |  ding -m serial_gail -c cartpole_dqn_gail_config.py -s 0  |
-|  32  |         [SQIL](https://arxiv.org/pdf/1905.11108.pdf)         |        ![IL](https://img.shields.io/badge/-IL-purple)        | [SQIL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sqil.html)<br>[entry/sqil](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_sqil.py) |     ding -m serial_sqil -c cartpole_sqil_config.py -s 0      |
-|  33  | [DQFD](https://arxiv.org/pdf/1704.03732.pdf) | ![IL](https://img.shields.io/badge/-IL-purple) | [DQFD doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/dqfd.html)<br>[policy/dqfd](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqfd.py) | ding -m serial_dqfd -c cartpole_dqfd_config.py -s 0 |
-|  34  | [R2D3](https://arxiv.org/pdf/1909.01387.pdf) | ![IL](https://img.shields.io/badge/-IL-purple) | [R2D3 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/r2d3.html)<br>[R2D3中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/r2d3_zh.html)<br>[policy/r2d3](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/r2d3_zh.html) | python3 -u pong_r2d3_r2d2expert_config.py |
-|  35  |     [Guided Cost Learning](https://arxiv.org/pdf/1603.00448.pdf)     |   ![IL](https://img.shields.io/badge/-IL-purple)             | [Guided Cost Learning中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/guided_cost_zh.html)<br>[reward_model/guided_cost](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/guided_cost_reward_model.py) |                          python3 lunarlander_gcl_config.py   |
-|  36  |         [TREX](https://arxiv.org/abs/1904.06387)          |   ![IL](https://img.shields.io/badge/-IL-purple)             | [TREX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/trex.html)<br>[reward_model/trex](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/trex_reward_model.py) |                          python3 mujoco_trex_main.py   |
-|  37  |         [Implicit Behavorial Cloning](https://implicitbc.github.io/) (DFO+MCMC)          |   ![IL](https://img.shields.io/badge/-IL-purple)    | [policy/ibc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ibc.py) <br> [model/template/ebm](https://github.com/opendilab/DI-engine/blob/main/ding/model/template/ebm.py) |        python3 d4rl_ibc_main.py -s 0 -c pen_human_ibc_mcmc_config.py  |
-|  38  |         [BCO](https://arxiv.org/pdf/1805.01954.pdf)          | ![IL](https://img.shields.io/badge/-IL-purple) | [entry/bco](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_bco.py) |                python3 -u cartpole_bco_config.py                 |
-|  39  |           [HER](https://arxiv.org/pdf/1707.01495.pdf)            |   ![exp](https://img.shields.io/badge/-exploration-orange)   | [HER doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/her.html)<br>[reward_model/her](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/her_reward_model.py) |                python3 -u bitflip_her_dqn.py                 |
-|  40  |           [RND](https://arxiv.org/abs/1810.12894)            |   ![exp](https://img.shields.io/badge/-exploration-orange)   | [RND doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/rnd.html)<br>[reward_model/rnd](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/rnd_reward_model.py) |             python3 -u cartpole_rnd_onppo_config.py           |
-|  41  |           [ICM](https://arxiv.org/pdf/1705.05363.pdf)            |   ![exp](https://img.shields.io/badge/-exploration-orange)   | [ICM doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/icm.html)<br>[ICM中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/icm_zh.html)<br>[reward_model/icm](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/icm_reward_model.py) |             python3 -u cartpole_ppo_icm_config.py              |
-|  42  |         [CQL](https://arxiv.org/pdf/2006.04779.pdf)          | ![offline](https://img.shields.io/badge/-offlineRL-darkblue) | [CQL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/cql.html)<br>[policy/cql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/cql.py) |                 python3 -u d4rl_cql_main.py                  |
-|  43  |         [TD3BC](https://arxiv.org/pdf/2106.06860.pdf)          | ![offline](https://img.shields.io/badge/-offlineRL-darkblue) | [TD3BC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3_bc.html)<br>[policy/td3_bc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3_bc.py) |                 python3 -u d4rl_td3_bc_main.py                  |
-|  44  |         [Decision Transformer](https://arxiv.org/pdf/2106.01345.pdf)          | ![offline](https://img.shields.io/badge/-offlineRL-darkblue) | [policy/dt](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dt.py) |                 python3 -u d4rl_dt_mujoco.py                |
-|  45  |         [EDAC](https://arxiv.org/pdf/2110.01548.pdf)          | ![offline](https://img.shields.io/badge/-offlineRL-darkblue) | [EDAC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/edac.html)<br>[policy/edac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/edac.py) |                 python3 -u d4rl_edac_main.py                  |
-|  46  |         [QGPO](https://arxiv.org/pdf/2304.12824.pdf)          | ![offline](https://img.shields.io/badge/-offlineRL-darkblue) | [QGPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qgpo.html)<br>[policy/qgpo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qgpo.py) |                 python3 -u ding/example/qgpo.py                  |
-|  47  |         MBSAC([SAC](https://arxiv.org/abs/1801.01290)+[MVE](https://arxiv.org/abs/1803.00101)+[SVG](https://arxiv.org/abs/1510.09142))         | ![continuous](https://img.shields.io/badge/-continous-green)![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue) | [policy/mbpolicy/mbsac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mbpolicy/mbsac.py) |        python3 -u pendulum_mbsac_mbpo_config.py \ python3 -u pendulum_mbsac_ddppo_config.py    |
-|  48  |         STEVESAC([SAC](https://arxiv.org/abs/1801.01290)+[STEVE](https://arxiv.org/abs/1807.01675)+[SVG](https://arxiv.org/abs/1510.09142))         | ![continuous](https://img.shields.io/badge/-continous-green)![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue) | [policy/mbpolicy/mbsac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mbpolicy/mbsac.py) |        python3 -u pendulum_stevesac_mbpo_config.py    |
-|  49  |         [MBPO](https://arxiv.org/pdf/1906.08253.pdf)         | ![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue) | [MBPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/mbpo.html)<br>[world_model/mbpo](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/mbpo.py) |        python3 -u pendulum_sac_mbpo_config.py    |
-|  50  |         [DDPPO](https://openreview.net/forum?id=rzvOQrnclO0)         | ![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue) | [world_model/ddppo](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/ddppo.py) |        python3 -u pendulum_mbsac_ddppo_config.py    |
-|  51  |         [DreamerV3](https://arxiv.org/pdf/2301.04104.pdf)         | ![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue) | [world_model/dreamerv3](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/dreamerv3.py) |        python3 -u cartpole_balance_dreamer_config.py    |
-|  52  |         [PER](https://arxiv.org/pdf/1511.05952.pdf)          |   ![other](https://img.shields.io/badge/-other-lightgrey)    | [worker/replay_buffer](https://github.com/opendilab/DI-engine/blob/main/ding/worker/replay_buffer/advanced_buffer.py) |                        `rainbow demo`                        |
-|  53  |         [GAE](https://arxiv.org/pdf/1506.02438.pdf)          |   ![other](https://img.shields.io/badge/-other-lightgrey)    | [rl_utils/gae](https://github.com/opendilab/DI-engine/blob/main/ding/rl_utils/gae.py) |                          `ppo demo`                          |
-|  54  |         [ST-DIM](https://arxiv.org/pdf/1906.08226.pdf)          |   ![other](https://img.shields.io/badge/-other-lightgrey)    | [torch_utils/loss/contrastive_loss](https://github.com/opendilab/DI-engine/blob/main/ding/torch_utils/loss/contrastive_loss.py) |        ding -m serial -c cartpole_dqn_stdim_config.py -s 0       |
-|  55  |         [PLR](https://arxiv.org/pdf/2010.03934.pdf)          |   ![other](https://img.shields.io/badge/-other-lightgrey)    | [PLR doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/plr.html)<br>[data/level_replay/level_sampler](https://github.com/opendilab/DI-engine/blob/main/ding/data/level_replay/level_sampler.py) |        python3 -u bigfish_plr_config.py -s 0       |
-|  56  |         [PCGrad](https://arxiv.org/pdf/2001.06782.pdf)          |   ![other](https://img.shields.io/badge/-other-lightgrey)    | [torch_utils/optimizer_helper/PCGrad](https://github.com/opendilab/DI-engine/blob/main/ding/data/torch_utils/optimizer_helper.py) |        python3 -u multi_mnist_pcgrad_main.py -s 0       |
 </details>
 
-
 ### Environment Versatility
+
 <details open>
 <summary>(Click to Collapse)</summary>
 
-|  No  |                Environment               |                 Label               |         Visualization            |                   Code and Doc Links                   |
-| :--: | :--------------------------------------: | :---------------------------------: | :--------------------------------:|:---------------------------------------------------------: |
-|  1   |       [Atari](https://github.com/openai/gym/tree/master/gym/envs/atari)    | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)   | ![original](./dizoo/atari/atari.gif)     |        [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/atari/envs) <br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/atari.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/atari_zh.html)        |
-|  2   |       [box2d/bipedalwalker](https://github.com/openai/gym/tree/master/gym/envs/box2d)    | ![continuous](https://img.shields.io/badge/-continous-green) | ![original](./dizoo/box2d/bipedalwalker/original.gif)        | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/bipedalwalker/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/bipedalwalker.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bipedalwalker_zh.html) |
-|  3   |       [box2d/lunarlander](https://github.com/openai/gym/tree/master/gym/envs/box2d)      | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)   | ![original](./dizoo/box2d/lunarlander/lunarlander.gif)   |  [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/lunarlander/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/lunarlander.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/lunarlander_zh.html)  |
-|  4   |       [classic_control/cartpole](https://github.com/openai/gym/tree/master/gym/envs/classic_control)       | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)   | ![original](./dizoo/classic_control/cartpole/cartpole.gif)    | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/cartpole/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/cartpole.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/cartpole_zh.html) |
-|  5   |       [classic_control/pendulum](https://github.com/openai/gym/tree/master/gym/envs/classic_control)       | ![continuous](https://img.shields.io/badge/-continous-green) | ![original](./dizoo/classic_control/pendulum/pendulum.gif)    | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/pendulum/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/pendulum.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pendulum_zh.html) |
-|  6   |       [competitive_rl](https://github.com/cuhkrlcourse/competitive-rl)       | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![selfplay](https://img.shields.io/badge/-selfplay-blue) | ![original](./dizoo/competitive_rl/competitive_rl.gif)   |  [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.classic_control)<br>[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/competitive_rl_zh.html)  |
-|  7   |       [gfootball](https://github.com/google-research/football)                        | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![sparse](https://img.shields.io/badge/-sparse%20reward-orange)![selfplay](https://img.shields.io/badge/-selfplay-blue) | ![original](./dizoo/gfootball/gfootball.gif)      | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.gfootball/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gfootball.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gfootball_zh.html) |
-|  8   |       [minigrid](https://github.com/maximecb/gym-minigrid)                         | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![sparse](https://img.shields.io/badge/-sparse%20reward-orange) | ![original](./dizoo/minigrid/minigrid.gif)         | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/minigrid/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/minigrid.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/minigrid_zh.html) |
-|  9   |       [MuJoCo](https://github.com/openai/gym/tree/master/gym/envs/mujoco)       |  ![continuous](https://img.shields.io/badge/-continous-green)  | ![original](./dizoo/mujoco/mujoco.gif)                    | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/majoco/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/mujoco.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/mujoco_zh.html) |
-|  10  |       [PettingZoo](https://github.com/Farama-Foundation/PettingZoo)         | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![continuous](https://img.shields.io/badge/-continous-green) ![marl](https://img.shields.io/badge/-MARL-yellow)  | ![original](./dizoo/petting_zoo/petting_zoo_mpe_simple_spread.gif)     |  [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/petting_zoo/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/pettingzoo.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pettingzoo_zh.html)  |
-|  11  |       [overcooked](https://github.com/HumanCompatibleAI/overcooked-demo)     | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![marl](https://img.shields.io/badge/-MARL-yellow)  | ![original](./dizoo/overcooked/overcooked.gif)       |   [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/overcooded/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/overcooked.html)   |
-|  12  |       [procgen](https://github.com/openai/procgen)                          | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)   | ![original](./dizoo/procgen/coinrun.gif) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/procgen)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/procgen.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/procgen_zh.html) |
-|  13  |       [pybullet](https://github.com/benelot/pybullet-gym)    | ![continuous](https://img.shields.io/badge/-continous-green)  | ![original](./dizoo/pybullet/pybullet.gif)       | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pybullet/envs)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pybullet_zh.html) |
-|  14  |       [smac](https://github.com/oxwhirl/smac)     | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![marl](https://img.shields.io/badge/-MARL-yellow)![selfplay](https://img.shields.io/badge/-selfplay-blue)![sparse](https://img.shields.io/badge/-sparse%20reward-orange) | ![original](./dizoo/smac/smac.gif)       | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/smac/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/smac.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/smac_zh.html) |
-| 15 | [d4rl](https://github.com/rail-berkeley/d4rl) | ![offline](https://img.shields.io/badge/-offlineRL-darkblue) | ![ori](dizoo/d4rl/d4rl.gif) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/d4rl)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/d4rl_zh.html) |
-|  16  |       league_demo                      | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![selfplay](https://img.shields.io/badge/-selfplay-blue) | ![original](./dizoo/league_demo/league_demo.png) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/league_demo/envs)                |
-|  17  |       pomdp atari                    | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pomdp/envs) |
-|  18  |       [bsuite](https://github.com/deepmind/bsuite)                         | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | ![original](./dizoo/bsuite/bsuite.png) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bsuite/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs//bsuite.html) <br> [环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bsuite_zh.html) |
-|  19  | [ImageNet](https://www.image-net.org/) | ![IL](https://img.shields.io/badge/-IL/SL-purple) | ![original](./dizoo/image_classification/imagenet.png) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/image_classification)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/image_cls_zh.html) |
-|  20  | [slime_volleyball](https://github.com/hardmaru/slimevolleygym) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![selfplay](https://img.shields.io/badge/-selfplay-blue) | ![ori](dizoo/slime_volley/slime_volley.gif) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/slime_volley)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/slime_volleyball.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/slime_volleyball_zh.html) |
-|  21  | [gym_hybrid](https://github.com/thomashirtz/gym-hybrid) | ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen) | ![ori](dizoo/gym_hybrid/moving_v0.gif) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_hybrid)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gym_hybrid.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_hybrid_zh.html) |
-|  22  | [GoBigger](https://github.com/opendilab/GoBigger) | ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)![marl](https://img.shields.io/badge/-MARL-yellow)![selfplay](https://img.shields.io/badge/-selfplay-blue) | ![ori](./dizoo/gobigger_overview.gif) | [dizoo link](https://github.com/opendilab/GoBigger-Challenge-2021/tree/main/di_baseline)<br>[env tutorial](https://gobigger.readthedocs.io/en/latest/index.html)<br>[环境指南](https://gobigger.readthedocs.io/zh_CN/latest/) |
-|  23  | [gym_soccer](https://github.com/openai/gym-soccer) | ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen) | ![ori](dizoo/gym_soccer/half_offensive.gif) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_soccer)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_soccer_zh.html) |
-|  24  |[multiagent_mujoco](https://github.com/schroederdewitt/multiagent_mujoco)       |  ![continuous](https://img.shields.io/badge/-continous-green) ![marl](https://img.shields.io/badge/-MARL-yellow) | ![original](./dizoo/mujoco/mujoco.gif)                    | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/multiagent_mujoco/envs)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/mujoco_zh.html) |
-|  25  |bitflip                                | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![sparse](https://img.shields.io/badge/-sparse%20reward-orange)  | ![original](./dizoo/bitflip/bitflip.gif)    | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bitflip/envs)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bitflip_zh.html) |
-|  26  |[sokoban](https://github.com/mpSchrader/gym-sokoban) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | ![Game 2](https://github.com/mpSchrader/gym-sokoban/raw/default/docs/Animations/solved_4.gif?raw=true) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/sokoban/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/sokoban.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/sokoban_zh.html) |
-|  27  |[gym_anytrading](https://github.com/AminHP/gym-anytrading) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | ![original](./dizoo/gym_anytrading/envs/position.png) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_anytrading) <br> [env tutorial](https://github.com/opendilab/DI-engine/blob/main/dizoo/gym_anytrading/envs/README.md) |
-|  28  |[mario](https://github.com/Kautenja/gym-super-mario-bros) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | ![original](./dizoo/mario/mario.gif) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/mario) <br> [env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gym_super_mario_bros.html) <br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_super_mario_bros_zh.html) |
-|  29  |[dmc2gym](https://github.com/denisyarats/dmc2gym) | ![continuous](https://img.shields.io/badge/-continous-green) | ![original](./dizoo/dmc2gym/dmc2gym_cheetah.png) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/dmc2gym)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/dmc2gym.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/dmc2gym_zh.html) |
-|  30  |[evogym](https://github.com/EvolutionGym/evogym) | ![continuous](https://img.shields.io/badge/-continous-green) | ![original](./dizoo/evogym/evogym.gif) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/evogym/envs) <br> [env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/evogym.html) <br> [环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/Evogym_zh.html) |
-|  31  |[gym-pybullet-drones](https://github.com/utiasDSL/gym-pybullet-drones) | ![continuous](https://img.shields.io/badge/-continous-green) | ![original](./dizoo/gym_pybullet_drones/gym_pybullet_drones.gif) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_pybullet_drones/envs)<br>环境指南 |
-|  32  |[beergame](https://github.com/OptMLGroup/DeepBeerInventory-RL) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | ![original](./dizoo/beergame/beergame.png) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/beergame/envs)<br>环境指南 |
-|  33  |[classic_control/acrobot](https://github.com/openai/gym/tree/master/gym/envs/classic_control) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | ![original](./dizoo/classic_control/acrobot/acrobot.gif) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/acrobot/envs)<br> [环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/acrobot_zh.html) |
-|  34  |[box2d/car_racing](https://github.com/openai/gym/blob/master/gym/envs/box2d/car_racing.py) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) <br> ![continuous](https://img.shields.io/badge/-continous-green) | ![original](./dizoo/box2d/carracing/car_racing.gif) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/carracing/envs)<br>环境指南 |
-|  35  |[metadrive](https://github.com/metadriverse/metadrive) | ![continuous](https://img.shields.io/badge/-continous-green) | ![original](./dizoo/metadrive/metadrive_env.gif) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/metadrive/env)<br> [环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/metadrive_zh.html) |
-|  36  |[cliffwalking](https://github.com/openai/gym/blob/master/gym/envs/toy_text/cliffwalking.py) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | ![original](./dizoo/cliffwalking/cliff_walking.gif) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/cliffwalking/envs)<br> env tutorial <br> 环境指南 |
-|  37  | [tabmwp](https://promptpg.github.io/explore.html) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | ![original](./dizoo/tabmwp/tabmwp.jpeg) | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/tabmwp) <br> env tutorial <br> 环境指南|
+
+| No |                                          Environment                                          |                                                                                                                   Label                                                                                                                   |                                             Visualization                                             |                                                                                                                                     Code and Doc Links                                                                                                                                     |
+| :-: | :--------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| 1 |               [Atari](https://github.com/openai/gym/tree/master/gym/envs/atari)               |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                  ![original](./dizoo/atari/atari.gif)                                  |               [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/atari/envs) <br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/atari.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/atari_zh.html)               |
+| 2 |        [box2d/bipedalwalker](https://github.com/openai/gym/tree/master/gym/envs/box2d)        |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                         ![original](./dizoo/box2d/bipedalwalker/original.gif)                         | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/bipedalwalker/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/bipedalwalker.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bipedalwalker_zh.html) |
+| 3 |         [box2d/lunarlander](https://github.com/openai/gym/tree/master/gym/envs/box2d)         |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                         ![original](./dizoo/box2d/lunarlander/lunarlander.gif)                         |    [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/lunarlander/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/lunarlander.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/lunarlander_zh.html)    |
+| 4 | [classic_control/cartpole](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                       ![original](./dizoo/classic_control/cartpole/cartpole.gif)                       |   [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/cartpole/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/cartpole.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/cartpole_zh.html)   |
+| 5 | [classic_control/pendulum](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                       ![original](./dizoo/classic_control/pendulum/pendulum.gif)                       |   [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/pendulum/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/pendulum.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pendulum_zh.html)   |
+| 6 |                [competitive_rl](https://github.com/cuhkrlcourse/competitive-rl)                |                                                         ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![selfplay](https://img.shields.io/badge/-selfplay-blue)                                                         |                         ![original](./dizoo/competitive_rl/competitive_rl.gif)                         |                                                     [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.classic_control)<br>[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/competitive_rl_zh.html)                                                     |
+| 7 |                    [gfootball](https://github.com/google-research/football)                    |                          ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![sparse](https://img.shields.io/badge/-sparse%20reward-orange)![selfplay](https://img.shields.io/badge/-selfplay-blue)                          |                              ![original](./dizoo/gfootball/gfootball.gif)                              |           [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.gfootball/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gfootball.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gfootball_zh.html)           |
+| 8 |                      [minigrid](https://github.com/maximecb/gym-minigrid)                      |                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![sparse](https://img.shields.io/badge/-sparse%20reward-orange)                                                      |                               ![original](./dizoo/minigrid/minigrid.gif)                               |             [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/minigrid/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/minigrid.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/minigrid_zh.html)             |
+| 9 |              [MuJoCo](https://github.com/openai/gym/tree/master/gym/envs/mujoco)              |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                                 ![original](./dizoo/mujoco/mujoco.gif)                                 |                [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/majoco/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/mujoco.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/mujoco_zh.html)                |
+| 10 |                 [PettingZoo](https://github.com/Farama-Foundation/PettingZoo)                 |                              ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![continuous](https://img.shields.io/badge/-continous-green) ![marl](https://img.shields.io/badge/-MARL-yellow)                              |                   ![original](./dizoo/petting_zoo/petting_zoo_mpe_simple_spread.gif)                   |        [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/petting_zoo/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/pettingzoo.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pettingzoo_zh.html)        |
+| 11 |               [overcooked](https://github.com/HumanCompatibleAI/overcooked-demo)               |                                                            ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![marl](https://img.shields.io/badge/-MARL-yellow)                                                            |                             ![original](./dizoo/overcooked/overcooked.gif)                             |                                                       [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/overcooded/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/overcooked.html)                                                       |
+| 12 |                          [procgen](https://github.com/openai/procgen)                          |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                ![original](./dizoo/procgen/coinrun.gif)                                |               [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/procgen)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/procgen.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/procgen_zh.html)               |
+| 13 |                      [pybullet](https://github.com/benelot/pybullet-gym)                      |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                               ![original](./dizoo/pybullet/pybullet.gif)                               |                                                        [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pybullet/envs)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pybullet_zh.html)                                                        |
+| 14 |                            [smac](https://github.com/oxwhirl/smac)                            | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![marl](https://img.shields.io/badge/-MARL-yellow)![selfplay](https://img.shields.io/badge/-selfplay-blue)![sparse](https://img.shields.io/badge/-sparse%20reward-orange) |                                   ![original](./dizoo/smac/smac.gif)                                   |                 [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/smac/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/smac.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/smac_zh.html)                 |
+| 15 |                         [d4rl](https://github.com/rail-berkeley/d4rl)                         |                                                                                       ![offline](https://img.shields.io/badge/-offlineRL-darkblue)                                                                                       |                                      ![ori](dizoo/d4rl/d4rl.gif)                                      |                                                              [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/d4rl)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/d4rl_zh.html)                                                              |
+| 16 |                                          league_demo                                          |                                                         ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![selfplay](https://img.shields.io/badge/-selfplay-blue)                                                         |                            ![original](./dizoo/league_demo/league_demo.png)                            |                                                                                                    [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/league_demo/envs)                                                                                                    |
+| 17 |                                          pomdp atari                                          |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                                                                                        |                                                                                                       [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pomdp/envs)                                                                                                       |
+| 18 |                          [bsuite](https://github.com/deepmind/bsuite)                          |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                 ![original](./dizoo/bsuite/bsuite.png)                                 |             [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bsuite/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs//bsuite.html) <br> [环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bsuite_zh.html)             |
+| 19 |                             [ImageNet](https://www.image-net.org/)                             |                                                                                             ![IL](https://img.shields.io/badge/-IL/SL-purple)                                                                                             |                         ![original](./dizoo/image_classification/imagenet.png)                         |                                                    [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/image_classification)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/image_cls_zh.html)                                                    |
+| 20 |                 [slime_volleyball](https://github.com/hardmaru/slimevolleygym)                 |                                                          ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![selfplay](https://img.shields.io/badge/-selfplay-blue)                                                          |                              ![ori](dizoo/slime_volley/slime_volley.gif)                              |    [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/slime_volley)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/slime_volleyball.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/slime_volleyball_zh.html)    |
+| 21 |                    [gym_hybrid](https://github.com/thomashirtz/gym-hybrid)                    |                                                                                         ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                                                                                         |                                 ![ori](dizoo/gym_hybrid/moving_v0.gif)                                 |           [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_hybrid)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gym_hybrid.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_hybrid_zh.html)           |
+| 22 |                       [GoBigger](https://github.com/opendilab/GoBigger)                       |                                    ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)![marl](https://img.shields.io/badge/-MARL-yellow)![selfplay](https://img.shields.io/badge/-selfplay-blue)                                    |                                 ![ori](./dizoo/gobigger_overview.gif)                                 |                                [dizoo link](https://github.com/opendilab/GoBigger-Challenge-2021/tree/main/di_baseline)<br>[env tutorial](https://gobigger.readthedocs.io/en/latest/index.html)<br>[环境指南](https://gobigger.readthedocs.io/zh_CN/latest/)                                |
+| 23 |                       [gym_soccer](https://github.com/openai/gym-soccer)                       |                                                                                         ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                                                                                         |                              ![ori](dizoo/gym_soccer/half_offensive.gif)                              |                                                        [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_soccer)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_soccer_zh.html)                                                        |
+| 24 |           [multiagent_mujoco](https://github.com/schroederdewitt/multiagent_mujoco)           |                                                              ![continuous](https://img.shields.io/badge/-continous-green) ![marl](https://img.shields.io/badge/-MARL-yellow)                                                              |                                 ![original](./dizoo/mujoco/mujoco.gif)                                 |                                                    [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/multiagent_mujoco/envs)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/mujoco_zh.html)                                                    |
+| 25 |                                            bitflip                                            |                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![sparse](https://img.shields.io/badge/-sparse%20reward-orange)                                                      |                                ![original](./dizoo/bitflip/bitflip.gif)                                |                                                         [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bitflip/envs)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bitflip_zh.html)                                                         |
+| 26 |                      [sokoban](https://github.com/mpSchrader/gym-sokoban)                      |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      | ![Game 2](https://github.com/mpSchrader/gym-sokoban/raw/default/docs/Animations/solved_4.gif?raw=true) |             [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/sokoban/envs)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/sokoban.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/sokoban_zh.html)             |
+| 27 |                   [gym_anytrading](https://github.com/AminHP/gym-anytrading)                   |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                         ![original](./dizoo/gym_anytrading/envs/position.png)                         |                                                [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_anytrading) <br> [env tutorial](https://github.com/opendilab/DI-engine/blob/main/dizoo/gym_anytrading/envs/README.md)                                                |
+| 28 |                   [mario](https://github.com/Kautenja/gym-super-mario-bros)                   |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                  ![original](./dizoo/mario/mario.gif)                                  |  [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/mario) <br> [env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gym_super_mario_bros.html) <br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_super_mario_bros_zh.html)  |
+| 29 |                       [dmc2gym](https://github.com/denisyarats/dmc2gym)                       |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                            ![original](./dizoo/dmc2gym/dmc2gym_cheetah.png)                            |               [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/dmc2gym)<br>[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/dmc2gym.html)<br>[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/dmc2gym_zh.html)               |
+| 30 |                        [evogym](https://github.com/EvolutionGym/evogym)                        |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                                 ![original](./dizoo/evogym/evogym.gif)                                 |            [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/evogym/envs) <br> [env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/evogym.html) <br> [环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/Evogym_zh.html)            |
+| 31 |             [gym-pybullet-drones](https://github.com/utiasDSL/gym-pybullet-drones)             |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                    ![original](./dizoo/gym_pybullet_drones/gym_pybullet_drones.gif)                    |                                                                                          [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_pybullet_drones/envs)<br>环境指南                                                                                          |
+| 32 |                 [beergame](https://github.com/OptMLGroup/DeepBeerInventory-RL)                 |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                               ![original](./dizoo/beergame/beergame.png)                               |                                                                                               [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/beergame/envs)<br>环境指南                                                                                               |
+| 33 | [classic_control/acrobot](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                        ![original](./dizoo/classic_control/acrobot/acrobot.gif)                        |                                                [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/acrobot/envs)<br> [环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/acrobot_zh.html)                                                |
+| 34 |   [box2d/car_racing](https://github.com/openai/gym/blob/master/gym/envs/box2d/car_racing.py)   |                                                     ![discrete](https://img.shields.io/badge/-discrete-brightgreen) <br> ![continuous](https://img.shields.io/badge/-continous-green)                                                     |                          ![original](./dizoo/box2d/carracing/car_racing.gif)                          |                                                                                            [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/carracing/envs)<br>环境指南                                                                                            |
+| 35 |                     [metadrive](https://github.com/metadriverse/metadrive)                     |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                            ![original](./dizoo/metadrive/metadrive_env.gif)                            |                                                       [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/metadrive/env)<br> [环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/metadrive_zh.html)                                                       |
+| 36 |  [cliffwalking](https://github.com/openai/gym/blob/master/gym/envs/toy_text/cliffwalking.py)  |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                          ![original](./dizoo/cliffwalking/cliff_walking.gif)                          |                                                                                    [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/cliffwalking/envs)<br> env tutorial <br> 环境指南                                                                                    |
+| 37 |                       [tabmwp](https://promptpg.github.io/explore.html)                       |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                ![original](./dizoo/tabmwp/tabmwp.jpeg)                                |                                                                                         [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/tabmwp) <br> env tutorial <br> 环境指南                                                                                         |
+| 38 |                  [frozen_lake](https://gymnasium.farama.org/environments/toy_text/frozen_lake)                       |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                ![original](./dizoo/frozen_lake/FrozenLake.gif)                                |                                                                                         [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/frozen_lake) <br> env tutorial <br> 环境指南                                                                                         |
+
 
 ![discrete](https://img.shields.io/badge/-discrete-brightgreen) means discrete action space
 
@@ -329,111 +334,112 @@ P.S: The `.py` file in `Runnable Demo` can be found in `dizoo`
 ![selfplay](https://img.shields.io/badge/-selfplay-blue) means environment that allows agent VS agent battle
 
 P.S. some enviroments in Atari, such as **MontezumaRevenge**, are also the sparse reward type.
-</details>
 
+</details>
 
 ### General Data Container: TreeTensor
 
 DI-engine utilizes [TreeTensor](https://github.com/opendilab/DI-treetensor) as the basic data container in various components, which is ease of use and consistent across different code modules such as environment definition, data processing and DRL optimization. Here are some concrete code examples:
 
 - TreeTensor can easily extend all the operations of `torch.Tensor` to nested data:
+
   <details close>
   <summary>(Click for Details)</summary>
 
-    ```python
-    import treetensor.torch as ttorch
-
-
-    # create random tensor
-    data = ttorch.randn({'a': (3, 2), 'b': {'c': (3, )}})
-    # clone+detach tensor
-    data_clone = data.clone().detach()
-    # access tree structure like attribute
-    a = data.a
-    c = data.b.c
-    # stack/cat/split
-    stacked_data = ttorch.stack([data, data_clone], 0)
-    cat_data = ttorch.cat([data, data_clone], 0)
-    data, data_clone = ttorch.split(stacked_data, 1)
-    # reshape
-    data = data.unsqueeze(-1)
-    data = data.squeeze(-1)
-    flatten_data = data.view(-1)
-    # indexing
-    data_0 = data[0]
-    data_1to2 = data[1:2]
-    # execute math calculations
-    data = data.sin()
-    data.b.c.cos_().clamp_(-1, 1)
-    data += data ** 2
-    # backward
-    data.requires_grad_(True)
-    loss = data.arctan().mean()
-    loss.backward()
-    # print shape
-    print(data.shape)
-    # result
-    # <Size 0x7fbd3346ddc0>
-    # ├── 'a' --> torch.Size([1, 3, 2])
-    # └── 'b' --> <Size 0x7fbd3346dd00>
-    #     └── 'c' --> torch.Size([1, 3])
-    ```
+  ```python
+  import treetensor.torch as ttorch
+
+
+  # create random tensor
+  data = ttorch.randn({'a': (3, 2), 'b': {'c': (3, )}})
+  # clone+detach tensor
+  data_clone = data.clone().detach()
+  # access tree structure like attribute
+  a = data.a
+  c = data.b.c
+  # stack/cat/split
+  stacked_data = ttorch.stack([data, data_clone], 0)
+  cat_data = ttorch.cat([data, data_clone], 0)
+  data, data_clone = ttorch.split(stacked_data, 1)
+  # reshape
+  data = data.unsqueeze(-1)
+  data = data.squeeze(-1)
+  flatten_data = data.view(-1)
+  # indexing
+  data_0 = data[0]
+  data_1to2 = data[1:2]
+  # execute math calculations
+  data = data.sin()
+  data.b.c.cos_().clamp_(-1, 1)
+  data += data ** 2
+  # backward
+  data.requires_grad_(True)
+  loss = data.arctan().mean()
+  loss.backward()
+  # print shape
+  print(data.shape)
+  # result
+  # <Size 0x7fbd3346ddc0>
+  # ├── 'a' --> torch.Size([1, 3, 2])
+  # └── 'b' --> <Size 0x7fbd3346dd00>
+  #     └── 'c' --> torch.Size([1, 3])
+  ```
 
   </details>
-
 - TreeTensor can make it simple yet effective to implement classic deep reinforcement learning pipeline
+
   <details close>
   <summary>(Click for Details)</summary>
 
-    ```diff
-    import torch
-    import treetensor.torch as ttorch
-  
-    B = 4
-
-
-    def get_item():
-        return {
-            'obs': {
-                'scalar': torch.randn(12),
-                'image': torch.randn(3, 32, 32),
-            },
-            'action': torch.randint(0, 10, size=(1,)),
-            'reward': torch.rand(1),
-            'done': False,
-        }
-
-
-    data = [get_item() for _ in range(B)]
-
-
-    # execute `stack` op
-    - def stack(data, dim):
-    -     elem = data[0]
-    -     if isinstance(elem, torch.Tensor):
-    -         return torch.stack(data, dim)
-    -     elif isinstance(elem, dict):
-    -         return {k: stack([item[k] for item in data], dim) for k in elem.keys()}
-    -     elif isinstance(elem, bool):
-    -         return torch.BoolTensor(data)
-    -     else:
-    -         raise TypeError("not support elem type: {}".format(type(elem)))
-    - stacked_data = stack(data, dim=0)
-    + data = [ttorch.tensor(d) for d in data]
-    + stacked_data = ttorch.stack(data, dim=0)
-    
-    # validate
-    - assert stacked_data['obs']['image'].shape == (B, 3, 32, 32)
-    - assert stacked_data['action'].shape == (B, 1)
-    - assert stacked_data['reward'].shape == (B, 1)
-    - assert stacked_data['done'].shape == (B,)
-    - assert stacked_data['done'].dtype == torch.bool
-    + assert stacked_data.obs.image.shape == (B, 3, 32, 32)
-    + assert stacked_data.action.shape == (B, 1)
-    + assert stacked_data.reward.shape == (B, 1)
-    + assert stacked_data.done.shape == (B,)
-    + assert stacked_data.done.dtype == torch.bool
-    ```
+  ```diff
+  import torch
+  import treetensor.torch as ttorch
+
+  B = 4
+
+
+  def get_item():
+      return {
+          'obs': {
+              'scalar': torch.randn(12),
+              'image': torch.randn(3, 32, 32),
+          },
+          'action': torch.randint(0, 10, size=(1,)),
+          'reward': torch.rand(1),
+          'done': False,
+      }
+
+
+  data = [get_item() for _ in range(B)]
+
+
+  # execute `stack` op
+  - def stack(data, dim):
+  -     elem = data[0]
+  -     if isinstance(elem, torch.Tensor):
+  -         return torch.stack(data, dim)
+  -     elif isinstance(elem, dict):
+  -         return {k: stack([item[k] for item in data], dim) for k in elem.keys()}
+  -     elif isinstance(elem, bool):
+  -         return torch.BoolTensor(data)
+  -     else:
+  -         raise TypeError("not support elem type: {}".format(type(elem)))
+  - stacked_data = stack(data, dim=0)
+  + data = [ttorch.tensor(d) for d in data]
+  + stacked_data = ttorch.stack(data, dim=0)
+
+  # validate
+  - assert stacked_data['obs']['image'].shape == (B, 3, 32, 32)
+  - assert stacked_data['action'].shape == (B, 1)
+  - assert stacked_data['reward'].shape == (B, 1)
+  - assert stacked_data['done'].shape == (B,)
+  - assert stacked_data['done'].dtype == torch.bool
+  + assert stacked_data.obs.image.shape == (B, 3, 32, 32)
+  + assert stacked_data.action.shape == (B, 1)
+  + assert stacked_data.reward.shape == (B, 1)
+  + assert stacked_data.done.shape == (B,)
+  + assert stacked_data.done.dtype == torch.bool
+  ```
 
   </details>
 
@@ -442,8 +448,8 @@ DI-engine utilizes [TreeTensor](https://github.com/opendilab/DI-treetensor) as t
 - [File an issue](https://github.com/opendilab/DI-engine/issues/new/choose) on Github
 - Open or participate in our [forum](https://github.com/opendilab/DI-engine/discussions)
 - Discuss on DI-engine [slack communication channel](https://join.slack.com/t/opendilab/shared_invite/zt-v9tmv4fp-nUBAQEH1_Kuyu_q4plBssQ)
-- Discuss on DI-engine's WeChat group (i.e. add us on WeChat: ding314assist) 
-  
+- Discuss on DI-engine's WeChat group (i.e. add us on WeChat: ding314assist)
+
   <img src=https://github.com/opendilab/DI-engine/blob/main/assets/wechat.jpeg width=35% />
 - Contact our email (opendilab@pjlab.org.cn)
 - Contributes to our future plan [Roadmap](https://github.com/opendilab/DI-engine/issues/548)
@@ -460,8 +466,8 @@ We appreciate all the feedbacks and contributions to improve DI-engine, both alg
 
 [![Forkers repo roster for @opendilab/DI-engine](https://reporoster.com/forks/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/network/members)
 
-
 ## Citation
+
 ```latex
 @misc{ding,
     title={DI-engine: OpenDILab Decision Intelligence Engine},
@@ -473,4 +479,5 @@ We appreciate all the feedbacks and contributions to improve DI-engine, both alg
 ```
 
 ## License
+
 DI-engine released under the Apache 2.0 license.
diff --git a/ding/example/dqn_frozen_lake.py b/ding/example/dqn_frozen_lake.py
new file mode 100644
index 0000000000..ec4b856339
--- /dev/null
+++ b/ding/example/dqn_frozen_lake.py
@@ -0,0 +1,45 @@
+from ditk import logging
+from ding.model import DQN
+from ding.policy import DQNPolicy
+from ding.envs import DingEnvWrapper, BaseEnvManagerV2
+from ding.data import DequeBuffer
+from ding.config import compile_config
+from ding.framework import task
+from ding.framework.context import OnlineRLContext
+from ding.framework.middleware import OffPolicyLearner, StepCollector, interaction_evaluator, data_pusher, \
+    eps_greedy_handler, CkptSaver, nstep_reward_enhancer, final_ctx_saver
+from ding.utils import set_pkg_seed
+from dizoo.frozen_lake.config.frozen_lake_dqn_config import main_config, create_config
+from dizoo.frozen_lake.envs import FrozenLakeEnv
+
+
+def main():
+    logging.getLogger().setLevel(logging.INFO)
+    main_config.policy.nstep = 5
+    cfg = compile_config(main_config, create_cfg=create_config, auto=True)
+    with task.start(async_mode=False, ctx=OnlineRLContext()):
+        collector_env = BaseEnvManagerV2(
+            env_fn=[lambda: FrozenLakeEnv(cfg=cfg.env) for _ in range(cfg.env.collector_env_num)], cfg=cfg.env.manager
+        )
+        evaluator_env = BaseEnvManagerV2(
+            env_fn=[lambda: FrozenLakeEnv(cfg=cfg.env) for _ in range(cfg.env.evaluator_env_num)], cfg=cfg.env.manager
+        )
+        set_pkg_seed(cfg.seed, use_cuda=cfg.policy.cuda)
+
+        model = DQN(**cfg.policy.model)
+        buffer_ = DequeBuffer(size=cfg.policy.other.replay_buffer.replay_buffer_size)
+        policy = DQNPolicy(cfg.policy, model=model)
+
+        task.use(interaction_evaluator(cfg, policy.eval_mode, evaluator_env))
+        task.use(eps_greedy_handler(cfg))
+        task.use(StepCollector(cfg, policy.collect_mode, collector_env))
+        task.use(nstep_reward_enhancer(cfg))
+        task.use(data_pusher(cfg, buffer_))
+        task.use(OffPolicyLearner(cfg, policy.learn_mode, buffer_))
+        task.use(CkptSaver(policy, cfg.exp_name, train_freq=100))
+        task.use(final_ctx_saver(cfg.exp_name))
+        task.run()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/dizoo/frozen_lake/FrozenLake.gif b/dizoo/frozen_lake/FrozenLake.gif
new file mode 100644
index 0000000000..db46a98e39
Binary files /dev/null and b/dizoo/frozen_lake/FrozenLake.gif differ
diff --git a/dizoo/frozen_lake/__init__.py b/dizoo/frozen_lake/__init__.py
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/dizoo/frozen_lake/config/__init__.py b/dizoo/frozen_lake/config/__init__.py
new file mode 100644
index 0000000000..9bec16a088
--- /dev/null
+++ b/dizoo/frozen_lake/config/__init__.py
@@ -0,0 +1 @@
+from .frozen_lake_dqn_config import main_config, create_config
diff --git a/dizoo/frozen_lake/config/frozen_lake_dqn_config.py b/dizoo/frozen_lake/config/frozen_lake_dqn_config.py
new file mode 100644
index 0000000000..84fe0de199
--- /dev/null
+++ b/dizoo/frozen_lake/config/frozen_lake_dqn_config.py
@@ -0,0 +1,64 @@
+from easydict import EasyDict
+
+frozen_lake_dqn_config = dict(
+    exp_name='frozen_lake_seed0',
+    env=dict(
+        collector_env_num=8,
+        evaluator_env_num=5,
+        n_evaluator_episode=10,
+        env_id='FrozenLake-v1',
+        desc=None,
+        map_name="4x4",
+        is_slippery=False,
+        save_replay_gif=False,
+    ),
+    policy=dict(
+        cuda=True,
+        load_path='frozen_lake_seed0/ckpt/ckpt_best.pth.tar',
+        model=dict(
+            obs_shape=16,
+            action_shape=4,
+            encoder_hidden_size_list=[128, 128, 64],
+            dueling=True,
+        ),
+        nstep=3,
+        discount_factor=0.97,
+        learn=dict(
+            update_per_collect=5,
+            batch_size=256,
+            learning_rate=0.001,
+        ),
+        collect=dict(n_sample=10),
+        eval=dict(evaluator=dict(eval_freq=40, )),
+        other=dict(
+            eps=dict(
+                type='exp',
+                start=0.8,
+                end=0.1,
+                decay=10000,
+            ),
+            replay_buffer=dict(replay_buffer_size=20000, ),
+        ),
+    ),
+)
+
+frozen_lake_dqn_config = EasyDict(frozen_lake_dqn_config)
+main_config = frozen_lake_dqn_config
+
+frozen_lake_dqn_create_config = dict(
+    env=dict(
+        type='frozen_lake',
+        import_names=['dizoo.frozen_lake.envs.frozen_lake_env'],
+    ),
+    env_manager=dict(type='base'),
+    policy=dict(type='dqn'),
+    replay_buffer=dict(type='deque', import_names=['ding.data.buffer.deque_buffer_wrapper']),
+)
+
+frozen_lake_dqn_create_config = EasyDict(frozen_lake_dqn_create_config)
+create_config = frozen_lake_dqn_create_config
+
+if __name__ == "__main__":
+    # or you can enter `ding -m serial -c frozen_lake_dqn_config.py -s 0`
+    from ding.entry import serial_pipeline
+    serial_pipeline((main_config, create_config), max_env_step=5000, seed=0)
diff --git a/dizoo/frozen_lake/envs/__init__.py b/dizoo/frozen_lake/envs/__init__.py
new file mode 100644
index 0000000000..dfec345139
--- /dev/null
+++ b/dizoo/frozen_lake/envs/__init__.py
@@ -0,0 +1 @@
+from .frozen_lake_env import FrozenLakeEnv
diff --git a/dizoo/frozen_lake/envs/frozen_lake_env.py b/dizoo/frozen_lake/envs/frozen_lake_env.py
new file mode 100644
index 0000000000..72f179077a
--- /dev/null
+++ b/dizoo/frozen_lake/envs/frozen_lake_env.py
@@ -0,0 +1,144 @@
+from typing import Any, Dict, List, Optional
+import imageio
+import os
+import gymnasium as gymn
+import numpy as np
+from ding.envs import BaseEnv, BaseEnvTimestep
+from ding.torch_utils import to_ndarray
+from ding.utils import ENV_REGISTRY
+
+
+@ENV_REGISTRY.register('frozen_lake')
+class FrozenLakeEnv(BaseEnv):
+
+    def __init__(self, cfg) -> None:
+        self._cfg = cfg
+        assert self._cfg.env_id == "FrozenLake-v1", "yout name is not FrozernLake_v1"
+        self._init_flag = False
+        self._save_replay_bool = False
+        self._save_replay_count = 0
+        self._init_flag = False
+        self._frames = []
+        self._replay_path = False
+
+    def reset(self) -> np.ndarray:
+        if not self._init_flag:
+            if not self._cfg.desc:  #specify maps non-preloaded maps
+                self._env = gymn.make(
+                    self._cfg.env_id,
+                    desc=self._cfg.desc,
+                    map_name=self._cfg.map_name,
+                    is_slippery=self._cfg.is_slippery,
+                    render_mode="rgb_array"
+                )
+        self._observation_space = self._env.observation_space
+        self._action_space = self._env.action_space
+        self._reward_space = gymn.spaces.Box(
+            low=self._env.reward_range[0], high=self._env.reward_range[1], shape=(1, ), dtype=np.float32
+        )
+        self._init_flag = True
+        self._eval_episode_return = 0
+        if hasattr(self, '_seed') and hasattr(self, '_dynamic_seed') and self._dynamic_seed:
+            np_seed = 100 * np.random.randint(1, 1000)
+            self._env_seed = self._seed + np_seed
+        elif hasattr(self, '_seed'):
+            self._env_seed = self._seed
+        if hasattr(self, '_seed'):
+            obs, info = self._env.reset(seed=self._env_seed)
+        else:
+            obs, info = self._env.reset()
+        obs = np.eye(16, dtype=np.float32)[obs - 1]
+        return obs
+
+    def close(self) -> None:
+        if self._init_flag:
+            self._env.close()
+        self._init_flag = False
+
+    def seed(self, seed: int, dynamic_seed: bool = True) -> None:
+        self._seed = seed
+        self._dynamic_seed = dynamic_seed
+        np.random.seed(self._seed)
+
+    def step(self, action: Dict) -> BaseEnvTimestep:
+        obs, rew, terminated, truncated, info = self._env.step(action[0])
+        self._eval_episode_return += rew
+        obs = np.eye(16, dtype=np.float32)[obs - 1]
+        rew = to_ndarray([rew])
+        if self._save_replay_bool:
+            picture = self._env.render()
+            self._frames.append(picture)
+        if terminated or truncated:
+            done = True
+        else:
+            done = False
+        if done:
+            info['eval_episode_return'] = self._eval_episode_return
+            if self._save_replay_bool:
+                assert self._replay_path is not None, "your should have a path"
+                path = os.path.join(
+                    self._replay_path, '{}_episode_{}.gif'.format(self._cfg.env_id, self._save_replay_count)
+                )
+                self.frames_to_gif(self._frames, path)
+                self._frames = []
+                self._save_replay_count += 1
+        rew = rew.astype(np.float32)
+        return BaseEnvTimestep(obs, rew, done, info)
+
+    def random_action(self) -> Dict:
+        raw_action = self._env.action_space.sample()
+        my_type = type(self._env.action_space)
+        return [raw_action]
+
+    def __repr__(self) -> str:
+        return "DI-engine Frozen Lake Env"
+
+    @property
+    def observation_space(self) -> gymn.spaces.Space:
+        return self._observation_space
+
+    @property
+    def action_space(self) -> gymn.spaces.Space:
+        return self._action_space
+
+    @property
+    def reward_space(self) -> gymn.spaces.Space:
+        return self._reward_space
+
+    def enable_save_replay(self, replay_path: Optional[str] = None) -> None:
+        if replay_path is None:
+            replay_path = './video'
+        self._replay_path = replay_path
+        self._save_replay_bool = True
+        self._save_replay_count = 0
+        self._frames = []
+
+    @staticmethod
+    def frames_to_gif(frames: List[imageio.core.util.Array], gif_path: str, duration: float = 0.1) -> None:
+        """
+        Convert a list of frames into a GIF.
+        Args:
+        - frames (List[imageio.core.util.Array]): A list of frames, each frame is an image.
+        - gif_path (str): The path to save the GIF file.
+        - duration (float): Duration between each frame in the GIF (seconds).
+
+        Returns:
+        None, the GIF file is saved directly to the specified path.
+        """
+        # Save all frames as temporary image files
+        temp_image_files = []
+        for i, frame in enumerate(frames):
+            temp_image_file = f"frame_{i}.png"  # Temporary file name
+            imageio.imwrite(temp_image_file, frame)  # Save the frame as a PNG file
+            temp_image_files.append(temp_image_file)
+
+        # Use imageio to convert temporary image files to GIF
+        with imageio.get_writer(gif_path, mode='I', duration=duration) as writer:
+            for temp_image_file in temp_image_files:
+                image = imageio.imread(temp_image_file)
+                writer.append_data(image)
+
+        # Clean up temporary image files
+        for temp_image_file in temp_image_files:
+            os.remove(temp_image_file)
+        print(f"GIF saved as {gif_path}")
diff --git a/dizoo/frozen_lake/envs/test_frozen_lake_env.py b/dizoo/frozen_lake/envs/test_frozen_lake_env.py
new file mode 100644
index 0000000000..c313a264e0
--- /dev/null
+++ b/dizoo/frozen_lake/envs/test_frozen_lake_env.py
@@ -0,0 +1,44 @@
+import numpy as np
+import pytest
+from dizoo.frozen_lake.envs import FrozenLakeEnv
+from easydict import EasyDict
+
+
+@pytest.mark.envtest
+class TestGymHybridEnv:
+
+    def test_my_lake(self):
+        env = FrozenLakeEnv(
+            EasyDict({
+                'env_id': 'FrozenLake-v1',
+                'desc': None,
+                'map_name': "4x4",
+                'is_slippery': False,
+            })
+        )
+        for _ in range(5):
+            env.seed(314, dynamic_seed=False)
+            assert env._seed == 314
+            obs = env.reset()
+            assert obs.shape == (
+                16,
+            ), "Considering the one-hot encoding format, your observation should have a dimensionality of 16."
+            for i in range(10):
+                env.enable_save_replay("./video")
+                # Both ``env.random_action()``, and utilizing ``np.random`` as well as action space,
+                # can generate legal random action.
+                if i < 5:
+                    random_action = np.array([env.action_space.sample()])
+                else:
+                    random_action = env.random_action()
+                timestep = env.step(random_action)
+                print(timestep)
+                assert isinstance(timestep.obs, np.ndarray)
+                assert isinstance(timestep.done, bool)
+                assert timestep.obs.shape == (16, )
+                assert timestep.reward.shape == (1, )
+                assert timestep.reward >= env.reward_space.low
+                assert timestep.reward <= env.reward_space.high
+
+        print(env.observation_space, env.action_space, env.reward_space)
+        env.close()