diff --git a/README.md b/README.md
index 1f0314b471..7e68959a2c 100644
--- a/README.md
+++ b/README.md
@@ -22,8 +22,6 @@

[](https://codecov.io/gh/opendilab/DI-engine)
-
-

[](https://github.com/opendilab/DI-engine/stargazers)
[](https://github.com/opendilab/DI-engine/network)
@@ -37,11 +35,11 @@
Updated on 2024.02.04 DI-engine-v0.5.1
-
## Introduction to DI-engine
+
[Documentation](https://di-engine-docs.readthedocs.io/en/latest/) | [中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/) | [Tutorials](https://di-engine-docs.readthedocs.io/en/latest/01_quickstart/index.html) | [Feature](#feature) | [Task & Middleware](https://di-engine-docs.readthedocs.io/en/latest/03_system/index.html) | [TreeTensor](#general-data-container-treetensor) | [Roadmap](https://github.com/opendilab/DI-engine/issues/548)
-**DI-engine** is a generalized decision intelligence engine for PyTorch and JAX.
+**DI-engine** is a generalized decision intelligence engine for PyTorch and JAX.
It provides **python-first** and **asynchronous-native** task and middleware abstractions, and modularly integrates several of the most important decision-making concepts: Env, Policy and Model. Based on the above mechanisms, DI-engine supports **various [deep reinforcement learning](https://di-engine-docs.readthedocs.io/en/latest/10_concepts/index.html) algorithms** with superior performance, high efficiency, well-organized [documentation](https://di-engine-docs.readthedocs.io/en/latest/) and [unittest](https://github.com/opendilab/DI-engine/actions):
@@ -89,6 +87,7 @@ It provides **python-first** and **asynchronous-native** task and middleware abs
- [awesome-diffusion-model-in-rl](https://github.com/opendilab/awesome-diffusion-model-in-rl): A curated list of Diffusion Model in RL resources
- [awesome-end-to-end-autonomous-driving](https://github.com/opendilab/awesome-end-to-end-autonomous-driving): A curated list of awesome End-to-End Autonomous Driving resources
- [awesome-driving-behavior-prediction](https://github.com/opendilab/awesome-driving-behavior-prediction): A collection of research papers for Driving Behavior Prediction
+
On the low-level end, DI-engine comes with a set of highly re-usable modules, including [RL optimization functions](https://github.com/opendilab/DI-engine/tree/main/ding/rl_utils), [PyTorch utilities](https://github.com/opendilab/DI-engine/tree/main/ding/torch_utils) and [auxiliary tools](https://github.com/opendilab/DI-engine/tree/main/ding/utils).
@@ -104,6 +103,7 @@ BTW, **DI-engine** also has some special **system optimization and design** for
- [DI-orchestrator](https://github.com/opendilab/DI-orchestrator): RL Kubernetes Custom Resource and Operator Lib
- [DI-hpc](https://github.com/opendilab/DI-hpc): RL HPC OP Lib
- [DI-store](https://github.com/opendilab/DI-store): RL Object Store
+
Have fun with exploration and exploitation.
@@ -128,11 +128,13 @@ Have fun with exploration and exploitation.
## Installation
You can simply install DI-engine from PyPI with the following command:
+
```bash
pip install DI-engine
```
If you use Anaconda or Miniconda, you can install DI-engine from conda-forge through the following command:
+
```bash
conda install -c opendilab di-engine
```
@@ -155,6 +157,7 @@ And our dockerhub repo can be found [here](https://hub.docker.com/repository/doc
- cityflow: opendilab/ding:nightly-cityflow
- evogym: opendilab/ding:nightly-evogym
- d4rl: opendilab/ding:nightly-d4rl
+
The detailed documentation are hosted on [doc](https://di-engine-docs.readthedocs.io/en/latest/) | [中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/).
@@ -175,8 +178,8 @@ The detailed documentation are hosted on [doc](https://di-engine-docs.readthedoc
[新老 pipeline 的异同对比](https://di-engine-docs.readthedocs.io/zh_CN/latest/04_best_practice/diff_in_new_pipeline_zh.html)
-
## Feature
+
### Algorithm Versatility
@@ -198,7 +201,6 @@ The detailed documentation are hosted on [doc](https://di-engine-docs.readthedoc
 [Offiline Reinforcement Learning](https://di-engine-docs.readthedocs.io/en/latest/02_algo/offline_rl.html)|[离线强化学习](https://di-engine-docs.readthedocs.io/zh_CN/latest/02_algo/offline_rl_zh.html)
-
 [Model-Based Reinforcement Learning](https://di-engine-docs.readthedocs.io/en/latest/02_algo/model_based_rl.html)|[基于模型的强化学习](https://di-engine-docs.readthedocs.io/zh_CN/latest/02_algo/model_based_rl_zh.html)
 means other sub-direction algorithms, usually as plugin-in in the whole pipeline
@@ -206,111 +208,114 @@ The detailed documentation are hosted on [doc](https://di-engine-docs.readthedoc
P.S: The `.py` file in `Runnable Demo` can be found in `dizoo`
+| No. | Algorithm | Label | Doc and Implementation | Runnable Demo |
+| :-: | :---------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------: |
+| 1 | [DQN](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) |  | [DQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/dqn.html)
[DQN中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/dqn_zh.html)
[policy/dqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py) | python3 -u cartpole_dqn_main.py / ding -m serial -c cartpole_dqn_config.py -s 0 |
+| 2 | [C51](https://arxiv.org/pdf/1707.06887.pdf) |  | [C51 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/c51.html)
[policy/c51](https://github.com/opendilab/DI-engine/blob/main/ding/policy/c51.py) | ding -m serial -c cartpole_c51_config.py -s 0 |
+| 3 | [QRDQN](https://arxiv.org/pdf/1710.10044.pdf) |  | [QRDQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qrdqn.html)
[policy/qrdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qrdqn.py) | ding -m serial -c cartpole_qrdqn_config.py -s 0 |
+| 4 | [IQN](https://arxiv.org/pdf/1806.06923.pdf) |  | [IQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/iqn.html)
[policy/iqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/iqn.py) | ding -m serial -c cartpole_iqn_config.py -s 0 |
+| 5 | [FQF](https://arxiv.org/pdf/1911.02140.pdf) |  | [FQF doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/fqf.html)
[policy/fqf](https://github.com/opendilab/DI-engine/blob/main/ding/policy/fqf.py) | ding -m serial -c cartpole_fqf_config.py -s 0 |
+| 6 | [Rainbow](https://arxiv.org/pdf/1710.02298.pdf) |  | [Rainbow doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/rainbow.html)
[policy/rainbow](https://github.com/opendilab/DI-engine/blob/main/ding/policy/rainbow.py) | ding -m serial -c cartpole_rainbow_config.py -s 0 |
+| 7 | [SQL](https://arxiv.org/pdf/1702.08165.pdf) |  | [SQL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sql.html)
[policy/sql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sql.py) | ding -m serial -c cartpole_sql_config.py -s 0 |
+| 8 | [R2D2](https://openreview.net/forum?id=r1lyTjAqYX) |  | [R2D2 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/r2d2.html)
[policy/r2d2](https://github.com/opendilab/DI-engine/blob/main/ding/policy/r2d2.py) | ding -m serial -c cartpole_r2d2_config.py -s 0 |
+| 9 | [PG](https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf) |  | [PG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/a2c.html)
[policy/pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pg.py) | ding -m serial -c cartpole_pg_config.py -s 0 |
+| 10 | [PromptPG](https://arxiv.org/abs/2209.14610) |  | [policy/prompt_pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/prompt_pg.py) | ding -m serial_onpolicy -c tabmwp_pg_config.py -s 0 |
+| 11 | [A2C](https://arxiv.org/pdf/1602.01783.pdf) |  | [A2C doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/a2c.html)
[policy/a2c](https://github.com/opendilab/DI-engine/blob/main/ding/policy/a2c.py) | ding -m serial -c cartpole_a2c_config.py -s 0 |
+| 12 | [PPO](https://arxiv.org/abs/1707.06347)/[MAPPO](https://arxiv.org/pdf/2103.01955.pdf) |  | [PPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ppo.html)
[policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py) | python3 -u cartpole_ppo_main.py / ding -m serial_onpolicy -c cartpole_ppo_config.py -s 0 |
+| 13 | [PPG](https://arxiv.org/pdf/2009.04416.pdf) |  | [PPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ppg.html)
[policy/ppg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppg.py) | python3 -u cartpole_ppg_main.py |
+| 14 | [ACER](https://arxiv.org/pdf/1611.01224.pdf) |  | [ACER doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/acer.html)
[policy/acer](https://github.com/opendilab/DI-engine/blob/main/ding/policy/acer.py) | ding -m serial -c cartpole_acer_config.py -s 0 |
+| 15 | [IMPALA](https://arxiv.org/abs/1802.01561) |  | [IMPALA doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/impala.html)
[policy/impala](https://github.com/opendilab/DI-engine/blob/main/ding/policy/impala.py) | ding -m serial -c cartpole_impala_config.py -s 0 |
+| 16 | [DDPG](https://arxiv.org/pdf/1509.02971.pdf)/[PADDPG](https://arxiv.org/pdf/1511.04143.pdf) |  | [DDPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ddpg.html)
[policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py) | ding -m serial -c pendulum_ddpg_config.py -s 0 |
+| 17 | [TD3](https://arxiv.org/pdf/1802.09477.pdf) |  | [TD3 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3.html)
[policy/td3](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3.py) | python3 -u pendulum_td3_main.py / ding -m serial -c pendulum_td3_config.py -s 0 |
+| 18 | [D4PG](https://arxiv.org/pdf/1804.08617.pdf) |  | [D4PG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/d4pg.html)
[policy/d4pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/d4pg.py) | python3 -u pendulum_d4pg_config.py |
+| 19 | [SAC](https://arxiv.org/abs/1801.01290)/[MASAC] |  | [SAC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sac.html)
[policy/sac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sac.py) | ding -m serial -c pendulum_sac_config.py -s 0 |
+| 20 | [PDQN](https://arxiv.org/pdf/1810.06394.pdf) |  | [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py) | ding -m serial -c gym_hybrid_pdqn_config.py -s 0 |
+| 21 | [MPDQN](https://arxiv.org/pdf/1905.04388.pdf) |  | [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py) | ding -m serial -c gym_hybrid_mpdqn_config.py -s 0 |
+| 22 | [HPPO](https://arxiv.org/pdf/1903.01344.pdf) |  | [policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py) | ding -m serial_onpolicy -c gym_hybrid_hppo_config.py -s 0 |
+| 23 | [BDQ](https://arxiv.org/pdf/1711.08946.pdf) |  | [policy/bdq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py) | python3 -u hopper_bdq_config.py |
+| 24 | [MDQN](https://arxiv.org/abs/2007.14430) |  | [policy/mdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mdqn.py) | python3 -u asterix_mdqn_config.py |
+| 25 | [QMIX](https://arxiv.org/pdf/1803.11485.pdf) |  | [QMIX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qmix.html)
[policy/qmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qmix.py) | ding -m serial -c smac_3s5z_qmix_config.py -s 0 |
+| 26 | [COMA](https://arxiv.org/pdf/1705.08926.pdf) |  | [COMA doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/coma.html)
[policy/coma](https://github.com/opendilab/DI-engine/blob/main/ding/policy/coma.py) | ding -m serial -c smac_3s5z_coma_config.py -s 0 |
+| 27 | [QTran](https://arxiv.org/abs/1905.05408) |  | [policy/qtran](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qtran.py) | ding -m serial -c smac_3s5z_qtran_config.py -s 0 |
+| 28 | [WQMIX](https://arxiv.org/abs/2006.10800) |  | [WQMIX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/wqmix.html)
[policy/wqmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/wqmix.py) | ding -m serial -c smac_3s5z_wqmix_config.py -s 0 |
+| 29 | [CollaQ](https://arxiv.org/pdf/2010.08531.pdf) |  | [CollaQ doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/collaq.html)
[policy/collaq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/collaq.py) | ding -m serial -c smac_3s5z_collaq_config.py -s 0 |
+| 30 | [MADDPG](https://arxiv.org/pdf/1706.02275.pdf) |  | [MADDPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ddpg.html)
[policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py) | ding -m serial -c ptz_simple_spread_maddpg_config.py -s 0 |
+| 31 | [GAIL](https://arxiv.org/pdf/1606.03476.pdf) |  | [GAIL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/gail.html)
[reward_model/gail](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/gail_irl_model.py) | ding -m serial_gail -c cartpole_dqn_gail_config.py -s 0 |
+| 32 | [SQIL](https://arxiv.org/pdf/1905.11108.pdf) |  | [SQIL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sqil.html)
[entry/sqil](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_sqil.py) | ding -m serial_sqil -c cartpole_sqil_config.py -s 0 |
+| 33 | [DQFD](https://arxiv.org/pdf/1704.03732.pdf) |  | [DQFD doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/dqfd.html)
[policy/dqfd](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqfd.py) | ding -m serial_dqfd -c cartpole_dqfd_config.py -s 0 |
+| 34 | [R2D3](https://arxiv.org/pdf/1909.01387.pdf) |  | [R2D3 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/r2d3.html)
[R2D3中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/r2d3_zh.html)
[policy/r2d3](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/r2d3_zh.html) | python3 -u pong_r2d3_r2d2expert_config.py |
+| 35 | [Guided Cost Learning](https://arxiv.org/pdf/1603.00448.pdf) |  | [Guided Cost Learning中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/guided_cost_zh.html)
[reward_model/guided_cost](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/guided_cost_reward_model.py) | python3 lunarlander_gcl_config.py |
+| 36 | [TREX](https://arxiv.org/abs/1904.06387) |  | [TREX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/trex.html)
[reward_model/trex](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/trex_reward_model.py) | python3 mujoco_trex_main.py |
+| 37 | [Implicit Behavorial Cloning](https://implicitbc.github.io/) (DFO+MCMC) |  | [policy/ibc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ibc.py)
[model/template/ebm](https://github.com/opendilab/DI-engine/blob/main/ding/model/template/ebm.py) | python3 d4rl_ibc_main.py -s 0 -c pen_human_ibc_mcmc_config.py |
+| 38 | [BCO](https://arxiv.org/pdf/1805.01954.pdf) |  | [entry/bco](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_bco.py) | python3 -u cartpole_bco_config.py |
+| 39 | [HER](https://arxiv.org/pdf/1707.01495.pdf) |  | [HER doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/her.html)
[reward_model/her](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/her_reward_model.py) | python3 -u bitflip_her_dqn.py |
+| 40 | [RND](https://arxiv.org/abs/1810.12894) |  | [RND doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/rnd.html)
[reward_model/rnd](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/rnd_reward_model.py) | python3 -u cartpole_rnd_onppo_config.py |
+| 41 | [ICM](https://arxiv.org/pdf/1705.05363.pdf) |  | [ICM doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/icm.html)
[ICM中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/icm_zh.html)
[reward_model/icm](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/icm_reward_model.py) | python3 -u cartpole_ppo_icm_config.py |
+| 42 | [CQL](https://arxiv.org/pdf/2006.04779.pdf) |  | [CQL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/cql.html)
[policy/cql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/cql.py) | python3 -u d4rl_cql_main.py |
+| 43 | [TD3BC](https://arxiv.org/pdf/2106.06860.pdf) |  | [TD3BC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3_bc.html)
[policy/td3_bc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3_bc.py) | python3 -u d4rl_td3_bc_main.py |
+| 44 | [Decision Transformer](https://arxiv.org/pdf/2106.01345.pdf) |  | [policy/dt](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dt.py) | python3 -u d4rl_dt_mujoco.py |
+| 45 | [EDAC](https://arxiv.org/pdf/2110.01548.pdf) |  | [EDAC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/edac.html)
[policy/edac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/edac.py) | python3 -u d4rl_edac_main.py |
+| 46 | [QGPO](https://arxiv.org/pdf/2304.12824.pdf) |  | [QGPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qgpo.html)
[policy/qgpo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qgpo.py) | python3 -u ding/example/qgpo.py |
+| 47 | MBSAC([SAC](https://arxiv.org/abs/1801.01290)+[MVE](https://arxiv.org/abs/1803.00101)+[SVG](https://arxiv.org/abs/1510.09142)) |  | [policy/mbpolicy/mbsac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mbpolicy/mbsac.py) | python3 -u pendulum_mbsac_mbpo_config.py \ python3 -u pendulum_mbsac_ddppo_config.py |
+| 48 | STEVESAC([SAC](https://arxiv.org/abs/1801.01290)+[STEVE](https://arxiv.org/abs/1807.01675)+[SVG](https://arxiv.org/abs/1510.09142)) |  | [policy/mbpolicy/mbsac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mbpolicy/mbsac.py) | python3 -u pendulum_stevesac_mbpo_config.py |
+| 49 | [MBPO](https://arxiv.org/pdf/1906.08253.pdf) |  | [MBPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/mbpo.html)
[world_model/mbpo](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/mbpo.py) | python3 -u pendulum_sac_mbpo_config.py |
+| 50 | [DDPPO](https://openreview.net/forum?id=rzvOQrnclO0) |  | [world_model/ddppo](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/ddppo.py) | python3 -u pendulum_mbsac_ddppo_config.py |
+| 51 | [DreamerV3](https://arxiv.org/pdf/2301.04104.pdf) |  | [world_model/dreamerv3](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/dreamerv3.py) | python3 -u cartpole_balance_dreamer_config.py |
+| 52 | [PER](https://arxiv.org/pdf/1511.05952.pdf) |  | [worker/replay_buffer](https://github.com/opendilab/DI-engine/blob/main/ding/worker/replay_buffer/advanced_buffer.py) | `rainbow demo` |
+| 53 | [GAE](https://arxiv.org/pdf/1506.02438.pdf) |  | [rl_utils/gae](https://github.com/opendilab/DI-engine/blob/main/ding/rl_utils/gae.py) | `ppo demo` |
+| 54 | [ST-DIM](https://arxiv.org/pdf/1906.08226.pdf) |  | [torch_utils/loss/contrastive_loss](https://github.com/opendilab/DI-engine/blob/main/ding/torch_utils/loss/contrastive_loss.py) | ding -m serial -c cartpole_dqn_stdim_config.py -s 0 |
+| 55 | [PLR](https://arxiv.org/pdf/2010.03934.pdf) |  | [PLR doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/plr.html)
[data/level_replay/level_sampler](https://github.com/opendilab/DI-engine/blob/main/ding/data/level_replay/level_sampler.py) | python3 -u bigfish_plr_config.py -s 0 |
+| 56 | [PCGrad](https://arxiv.org/pdf/2001.06782.pdf) |  | [torch_utils/optimizer_helper/PCGrad](https://github.com/opendilab/DI-engine/blob/main/ding/data/torch_utils/optimizer_helper.py) | python3 -u multi_mnist_pcgrad_main.py -s 0 |
-| No. | Algorithm | Label | Doc and Implementation | Runnable Demo |
-| :--: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
-| 1 | [DQN](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) |  | [DQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/dqn.html)
[DQN中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/dqn_zh.html)
[policy/dqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py) | python3 -u cartpole_dqn_main.py / ding -m serial -c cartpole_dqn_config.py -s 0 |
-| 2 | [C51](https://arxiv.org/pdf/1707.06887.pdf) |  | [C51 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/c51.html)
[policy/c51](https://github.com/opendilab/DI-engine/blob/main/ding/policy/c51.py) | ding -m serial -c cartpole_c51_config.py -s 0 |
-| 3 | [QRDQN](https://arxiv.org/pdf/1710.10044.pdf) |  | [QRDQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qrdqn.html)
[policy/qrdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qrdqn.py) | ding -m serial -c cartpole_qrdqn_config.py -s 0 |
-| 4 | [IQN](https://arxiv.org/pdf/1806.06923.pdf) |  | [IQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/iqn.html)
[policy/iqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/iqn.py) | ding -m serial -c cartpole_iqn_config.py -s 0 |
-| 5 | [FQF](https://arxiv.org/pdf/1911.02140.pdf) |  | [FQF doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/fqf.html)
[policy/fqf](https://github.com/opendilab/DI-engine/blob/main/ding/policy/fqf.py) | ding -m serial -c cartpole_fqf_config.py -s 0 |
-| 6 | [Rainbow](https://arxiv.org/pdf/1710.02298.pdf) |  | [Rainbow doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/rainbow.html)
[policy/rainbow](https://github.com/opendilab/DI-engine/blob/main/ding/policy/rainbow.py) | ding -m serial -c cartpole_rainbow_config.py -s 0 |
-| 7 | [SQL](https://arxiv.org/pdf/1702.08165.pdf) |  | [SQL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sql.html)
[policy/sql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sql.py) | ding -m serial -c cartpole_sql_config.py -s 0 |
-| 8 | [R2D2](https://openreview.net/forum?id=r1lyTjAqYX) |  | [R2D2 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/r2d2.html)
[policy/r2d2](https://github.com/opendilab/DI-engine/blob/main/ding/policy/r2d2.py) | ding -m serial -c cartpole_r2d2_config.py -s 0 |
-| 9 | [PG](https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf) |  | [PG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/a2c.html)
[policy/pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pg.py) | ding -m serial -c cartpole_pg_config.py -s 0 |
-| 10 | [PromptPG](https://arxiv.org/abs/2209.14610) |  | [policy/prompt_pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/prompt_pg.py) | ding -m serial_onpolicy -c tabmwp_pg_config.py -s 0 |
-| 11 | [A2C](https://arxiv.org/pdf/1602.01783.pdf) |  | [A2C doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/a2c.html)
[policy/a2c](https://github.com/opendilab/DI-engine/blob/main/ding/policy/a2c.py) | ding -m serial -c cartpole_a2c_config.py -s 0 |
-| 12 | [PPO](https://arxiv.org/abs/1707.06347)/[MAPPO](https://arxiv.org/pdf/2103.01955.pdf) |  | [PPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ppo.html)
[policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py) | python3 -u cartpole_ppo_main.py / ding -m serial_onpolicy -c cartpole_ppo_config.py -s 0 |
-| 13 | [PPG](https://arxiv.org/pdf/2009.04416.pdf) |  | [PPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ppg.html)
[policy/ppg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppg.py) | python3 -u cartpole_ppg_main.py |
-| 14 | [ACER](https://arxiv.org/pdf/1611.01224.pdf) |  | [ACER doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/acer.html)
[policy/acer](https://github.com/opendilab/DI-engine/blob/main/ding/policy/acer.py) | ding -m serial -c cartpole_acer_config.py -s 0 |
-| 15 | [IMPALA](https://arxiv.org/abs/1802.01561) |  | [IMPALA doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/impala.html)
[policy/impala](https://github.com/opendilab/DI-engine/blob/main/ding/policy/impala.py) | ding -m serial -c cartpole_impala_config.py -s 0 |
-| 16 | [DDPG](https://arxiv.org/pdf/1509.02971.pdf)/[PADDPG](https://arxiv.org/pdf/1511.04143.pdf) |  | [DDPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ddpg.html)
[policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py) | ding -m serial -c pendulum_ddpg_config.py -s 0 |
-| 17 | [TD3](https://arxiv.org/pdf/1802.09477.pdf) |  | [TD3 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3.html)
[policy/td3](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3.py) | python3 -u pendulum_td3_main.py / ding -m serial -c pendulum_td3_config.py -s 0 |
-| 18 | [D4PG](https://arxiv.org/pdf/1804.08617.pdf) |  | [D4PG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/d4pg.html)
[policy/d4pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/d4pg.py) | python3 -u pendulum_d4pg_config.py |
-| 19 | [SAC](https://arxiv.org/abs/1801.01290)/[MASAC] |  | [SAC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sac.html)
[policy/sac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sac.py) | ding -m serial -c pendulum_sac_config.py -s 0 |
-| 20 | [PDQN](https://arxiv.org/pdf/1810.06394.pdf) |  | [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py) | ding -m serial -c gym_hybrid_pdqn_config.py -s 0 |
-| 21 | [MPDQN](https://arxiv.org/pdf/1905.04388.pdf) |  | [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py) | ding -m serial -c gym_hybrid_mpdqn_config.py -s 0 |
-| 22 | [HPPO](https://arxiv.org/pdf/1903.01344.pdf) |  | [policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py) | ding -m serial_onpolicy -c gym_hybrid_hppo_config.py -s 0 |
-| 23 | [BDQ](https://arxiv.org/pdf/1711.08946.pdf) |  | [policy/bdq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py) | python3 -u hopper_bdq_config.py |
-| 24 | [MDQN](https://arxiv.org/abs/2007.14430) |  | [policy/mdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mdqn.py) | python3 -u asterix_mdqn_config.py |
-| 25 | [QMIX](https://arxiv.org/pdf/1803.11485.pdf) |  | [QMIX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qmix.html)
[policy/qmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qmix.py) | ding -m serial -c smac_3s5z_qmix_config.py -s 0 |
-| 26 | [COMA](https://arxiv.org/pdf/1705.08926.pdf) |  | [COMA doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/coma.html)
[policy/coma](https://github.com/opendilab/DI-engine/blob/main/ding/policy/coma.py) | ding -m serial -c smac_3s5z_coma_config.py -s 0 |
-| 27 | [QTran](https://arxiv.org/abs/1905.05408) |  | [policy/qtran](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qtran.py) | ding -m serial -c smac_3s5z_qtran_config.py -s 0 |
-| 28 | [WQMIX](https://arxiv.org/abs/2006.10800) |  | [WQMIX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/wqmix.html)
[policy/wqmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/wqmix.py) | ding -m serial -c smac_3s5z_wqmix_config.py -s 0 |
-| 29 | [CollaQ](https://arxiv.org/pdf/2010.08531.pdf) |  | [CollaQ doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/collaq.html)
[policy/collaq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/collaq.py) | ding -m serial -c smac_3s5z_collaq_config.py -s 0 |
-| 30 | [MADDPG](https://arxiv.org/pdf/1706.02275.pdf) |  | [MADDPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ddpg.html)
[policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py) | ding -m serial -c ptz_simple_spread_maddpg_config.py -s 0 |
-| 31 | [GAIL](https://arxiv.org/pdf/1606.03476.pdf) |  | [GAIL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/gail.html)
[reward_model/gail](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/gail_irl_model.py) | ding -m serial_gail -c cartpole_dqn_gail_config.py -s 0 |
-| 32 | [SQIL](https://arxiv.org/pdf/1905.11108.pdf) |  | [SQIL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sqil.html)
[entry/sqil](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_sqil.py) | ding -m serial_sqil -c cartpole_sqil_config.py -s 0 |
-| 33 | [DQFD](https://arxiv.org/pdf/1704.03732.pdf) |  | [DQFD doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/dqfd.html)
[policy/dqfd](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqfd.py) | ding -m serial_dqfd -c cartpole_dqfd_config.py -s 0 |
-| 34 | [R2D3](https://arxiv.org/pdf/1909.01387.pdf) |  | [R2D3 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/r2d3.html)
[R2D3中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/r2d3_zh.html)
[policy/r2d3](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/r2d3_zh.html) | python3 -u pong_r2d3_r2d2expert_config.py |
-| 35 | [Guided Cost Learning](https://arxiv.org/pdf/1603.00448.pdf) |  | [Guided Cost Learning中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/guided_cost_zh.html)
[reward_model/guided_cost](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/guided_cost_reward_model.py) | python3 lunarlander_gcl_config.py |
-| 36 | [TREX](https://arxiv.org/abs/1904.06387) |  | [TREX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/trex.html)
[reward_model/trex](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/trex_reward_model.py) | python3 mujoco_trex_main.py |
-| 37 | [Implicit Behavorial Cloning](https://implicitbc.github.io/) (DFO+MCMC) |  | [policy/ibc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ibc.py)
[model/template/ebm](https://github.com/opendilab/DI-engine/blob/main/ding/model/template/ebm.py) | python3 d4rl_ibc_main.py -s 0 -c pen_human_ibc_mcmc_config.py |
-| 38 | [BCO](https://arxiv.org/pdf/1805.01954.pdf) |  | [entry/bco](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_bco.py) | python3 -u cartpole_bco_config.py |
-| 39 | [HER](https://arxiv.org/pdf/1707.01495.pdf) |  | [HER doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/her.html)
[reward_model/her](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/her_reward_model.py) | python3 -u bitflip_her_dqn.py |
-| 40 | [RND](https://arxiv.org/abs/1810.12894) |  | [RND doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/rnd.html)
[reward_model/rnd](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/rnd_reward_model.py) | python3 -u cartpole_rnd_onppo_config.py |
-| 41 | [ICM](https://arxiv.org/pdf/1705.05363.pdf) |  | [ICM doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/icm.html)
[ICM中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/icm_zh.html)
[reward_model/icm](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/icm_reward_model.py) | python3 -u cartpole_ppo_icm_config.py |
-| 42 | [CQL](https://arxiv.org/pdf/2006.04779.pdf) |  | [CQL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/cql.html)
[policy/cql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/cql.py) | python3 -u d4rl_cql_main.py |
-| 43 | [TD3BC](https://arxiv.org/pdf/2106.06860.pdf) |  | [TD3BC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3_bc.html)
[policy/td3_bc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3_bc.py) | python3 -u d4rl_td3_bc_main.py |
-| 44 | [Decision Transformer](https://arxiv.org/pdf/2106.01345.pdf) |  | [policy/dt](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dt.py) | python3 -u d4rl_dt_mujoco.py |
-| 45 | [EDAC](https://arxiv.org/pdf/2110.01548.pdf) |  | [EDAC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/edac.html)
[policy/edac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/edac.py) | python3 -u d4rl_edac_main.py |
-| 46 | [QGPO](https://arxiv.org/pdf/2304.12824.pdf) |  | [QGPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qgpo.html)
[policy/qgpo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qgpo.py) | python3 -u ding/example/qgpo.py |
-| 47 | MBSAC([SAC](https://arxiv.org/abs/1801.01290)+[MVE](https://arxiv.org/abs/1803.00101)+[SVG](https://arxiv.org/abs/1510.09142)) |  | [policy/mbpolicy/mbsac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mbpolicy/mbsac.py) | python3 -u pendulum_mbsac_mbpo_config.py \ python3 -u pendulum_mbsac_ddppo_config.py |
-| 48 | STEVESAC([SAC](https://arxiv.org/abs/1801.01290)+[STEVE](https://arxiv.org/abs/1807.01675)+[SVG](https://arxiv.org/abs/1510.09142)) |  | [policy/mbpolicy/mbsac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mbpolicy/mbsac.py) | python3 -u pendulum_stevesac_mbpo_config.py |
-| 49 | [MBPO](https://arxiv.org/pdf/1906.08253.pdf) |  | [MBPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/mbpo.html)
[world_model/mbpo](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/mbpo.py) | python3 -u pendulum_sac_mbpo_config.py |
-| 50 | [DDPPO](https://openreview.net/forum?id=rzvOQrnclO0) |  | [world_model/ddppo](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/ddppo.py) | python3 -u pendulum_mbsac_ddppo_config.py |
-| 51 | [DreamerV3](https://arxiv.org/pdf/2301.04104.pdf) |  | [world_model/dreamerv3](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/dreamerv3.py) | python3 -u cartpole_balance_dreamer_config.py |
-| 52 | [PER](https://arxiv.org/pdf/1511.05952.pdf) |  | [worker/replay_buffer](https://github.com/opendilab/DI-engine/blob/main/ding/worker/replay_buffer/advanced_buffer.py) | `rainbow demo` |
-| 53 | [GAE](https://arxiv.org/pdf/1506.02438.pdf) |  | [rl_utils/gae](https://github.com/opendilab/DI-engine/blob/main/ding/rl_utils/gae.py) | `ppo demo` |
-| 54 | [ST-DIM](https://arxiv.org/pdf/1906.08226.pdf) |  | [torch_utils/loss/contrastive_loss](https://github.com/opendilab/DI-engine/blob/main/ding/torch_utils/loss/contrastive_loss.py) | ding -m serial -c cartpole_dqn_stdim_config.py -s 0 |
-| 55 | [PLR](https://arxiv.org/pdf/2010.03934.pdf) |  | [PLR doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/plr.html)
[data/level_replay/level_sampler](https://github.com/opendilab/DI-engine/blob/main/ding/data/level_replay/level_sampler.py) | python3 -u bigfish_plr_config.py -s 0 |
-| 56 | [PCGrad](https://arxiv.org/pdf/2001.06782.pdf) |  | [torch_utils/optimizer_helper/PCGrad](https://github.com/opendilab/DI-engine/blob/main/ding/data/torch_utils/optimizer_helper.py) | python3 -u multi_mnist_pcgrad_main.py -s 0 |
-
### Environment Versatility
+
(Click to Collapse)
-| No | Environment | Label | Visualization | Code and Doc Links |
-| :--: | :--------------------------------------: | :---------------------------------: | :--------------------------------:|:---------------------------------------------------------: |
-| 1 | [Atari](https://github.com/openai/gym/tree/master/gym/envs/atari) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/atari/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/atari.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/atari_zh.html) |
-| 2 | [box2d/bipedalwalker](https://github.com/openai/gym/tree/master/gym/envs/box2d) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/bipedalwalker/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/bipedalwalker.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bipedalwalker_zh.html) |
-| 3 | [box2d/lunarlander](https://github.com/openai/gym/tree/master/gym/envs/box2d) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/lunarlander/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/lunarlander.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/lunarlander_zh.html) |
-| 4 | [classic_control/cartpole](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/cartpole/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/cartpole.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/cartpole_zh.html) |
-| 5 | [classic_control/pendulum](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/pendulum/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/pendulum.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pendulum_zh.html) |
-| 6 | [competitive_rl](https://github.com/cuhkrlcourse/competitive-rl) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.classic_control)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/competitive_rl_zh.html) |
-| 7 | [gfootball](https://github.com/google-research/football) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.gfootball/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gfootball.html)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gfootball_zh.html) |
-| 8 | [minigrid](https://github.com/maximecb/gym-minigrid) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/minigrid/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/minigrid.html)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/minigrid_zh.html) |
-| 9 | [MuJoCo](https://github.com/openai/gym/tree/master/gym/envs/mujoco) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/majoco/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/mujoco.html)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/mujoco_zh.html) |
-| 10 | [PettingZoo](https://github.com/Farama-Foundation/PettingZoo) |    |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/petting_zoo/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/pettingzoo.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pettingzoo_zh.html) |
-| 11 | [overcooked](https://github.com/HumanCompatibleAI/overcooked-demo) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/overcooded/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/overcooked.html) |
-| 12 | [procgen](https://github.com/openai/procgen) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/procgen)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/procgen.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/procgen_zh.html) |
-| 13 | [pybullet](https://github.com/benelot/pybullet-gym) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pybullet/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pybullet_zh.html) |
-| 14 | [smac](https://github.com/oxwhirl/smac) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/smac/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/smac.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/smac_zh.html) |
-| 15 | [d4rl](https://github.com/rail-berkeley/d4rl) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/d4rl)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/d4rl_zh.html) |
-| 16 | league_demo |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/league_demo/envs) |
-| 17 | pomdp atari |  | | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pomdp/envs) |
-| 18 | [bsuite](https://github.com/deepmind/bsuite) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bsuite/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs//bsuite.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bsuite_zh.html) |
-| 19 | [ImageNet](https://www.image-net.org/) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/image_classification)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/image_cls_zh.html) |
-| 20 | [slime_volleyball](https://github.com/hardmaru/slimevolleygym) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/slime_volley)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/slime_volleyball.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/slime_volleyball_zh.html) |
-| 21 | [gym_hybrid](https://github.com/thomashirtz/gym-hybrid) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_hybrid)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gym_hybrid.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_hybrid_zh.html) |
-| 22 | [GoBigger](https://github.com/opendilab/GoBigger) |  |  | [dizoo link](https://github.com/opendilab/GoBigger-Challenge-2021/tree/main/di_baseline)
[env tutorial](https://gobigger.readthedocs.io/en/latest/index.html)
[环境指南](https://gobigger.readthedocs.io/zh_CN/latest/) |
-| 23 | [gym_soccer](https://github.com/openai/gym-soccer) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_soccer)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_soccer_zh.html) |
-| 24 |[multiagent_mujoco](https://github.com/schroederdewitt/multiagent_mujoco) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/multiagent_mujoco/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/mujoco_zh.html) |
-| 25 |bitflip |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bitflip/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bitflip_zh.html) |
-| 26 |[sokoban](https://github.com/mpSchrader/gym-sokoban) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/sokoban/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/sokoban.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/sokoban_zh.html) |
-| 27 |[gym_anytrading](https://github.com/AminHP/gym-anytrading) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_anytrading)
[env tutorial](https://github.com/opendilab/DI-engine/blob/main/dizoo/gym_anytrading/envs/README.md) |
-| 28 |[mario](https://github.com/Kautenja/gym-super-mario-bros) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/mario)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gym_super_mario_bros.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_super_mario_bros_zh.html) |
-| 29 |[dmc2gym](https://github.com/denisyarats/dmc2gym) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/dmc2gym)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/dmc2gym.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/dmc2gym_zh.html) |
-| 30 |[evogym](https://github.com/EvolutionGym/evogym) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/evogym/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/evogym.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/Evogym_zh.html) |
-| 31 |[gym-pybullet-drones](https://github.com/utiasDSL/gym-pybullet-drones) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_pybullet_drones/envs)
环境指南 |
-| 32 |[beergame](https://github.com/OptMLGroup/DeepBeerInventory-RL) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/beergame/envs)
环境指南 |
-| 33 |[classic_control/acrobot](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/acrobot/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/acrobot_zh.html) |
-| 34 |[box2d/car_racing](https://github.com/openai/gym/blob/master/gym/envs/box2d/car_racing.py) | 
 |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/carracing/envs)
环境指南 |
-| 35 |[metadrive](https://github.com/metadriverse/metadrive) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/metadrive/env)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/metadrive_zh.html) |
-| 36 |[cliffwalking](https://github.com/openai/gym/blob/master/gym/envs/toy_text/cliffwalking.py) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/cliffwalking/envs)
env tutorial
环境指南 |
-| 37 | [tabmwp](https://promptpg.github.io/explore.html) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/tabmwp)
env tutorial
环境指南|
+
+| No | Environment | Label | Visualization | Code and Doc Links |
+| :-: | :--------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| 1 | [Atari](https://github.com/openai/gym/tree/master/gym/envs/atari) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/atari/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/atari.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/atari_zh.html) |
+| 2 | [box2d/bipedalwalker](https://github.com/openai/gym/tree/master/gym/envs/box2d) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/bipedalwalker/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/bipedalwalker.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bipedalwalker_zh.html) |
+| 3 | [box2d/lunarlander](https://github.com/openai/gym/tree/master/gym/envs/box2d) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/lunarlander/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/lunarlander.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/lunarlander_zh.html) |
+| 4 | [classic_control/cartpole](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/cartpole/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/cartpole.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/cartpole_zh.html) |
+| 5 | [classic_control/pendulum](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/pendulum/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/pendulum.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pendulum_zh.html) |
+| 6 | [competitive_rl](https://github.com/cuhkrlcourse/competitive-rl) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.classic_control)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/competitive_rl_zh.html) |
+| 7 | [gfootball](https://github.com/google-research/football) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.gfootball/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gfootball.html)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gfootball_zh.html) |
+| 8 | [minigrid](https://github.com/maximecb/gym-minigrid) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/minigrid/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/minigrid.html)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/minigrid_zh.html) |
+| 9 | [MuJoCo](https://github.com/openai/gym/tree/master/gym/envs/mujoco) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/majoco/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/mujoco.html)
[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/mujoco_zh.html) |
+| 10 | [PettingZoo](https://github.com/Farama-Foundation/PettingZoo) |    |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/petting_zoo/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/pettingzoo.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pettingzoo_zh.html) |
+| 11 | [overcooked](https://github.com/HumanCompatibleAI/overcooked-demo) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/overcooded/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/overcooked.html) |
+| 12 | [procgen](https://github.com/openai/procgen) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/procgen)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/procgen.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/procgen_zh.html) |
+| 13 | [pybullet](https://github.com/benelot/pybullet-gym) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pybullet/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pybullet_zh.html) |
+| 14 | [smac](https://github.com/oxwhirl/smac) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/smac/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/smac.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/smac_zh.html) |
+| 15 | [d4rl](https://github.com/rail-berkeley/d4rl) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/d4rl)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/d4rl_zh.html) |
+| 16 | league_demo |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/league_demo/envs) |
+| 17 | pomdp atari |  | | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pomdp/envs) |
+| 18 | [bsuite](https://github.com/deepmind/bsuite) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bsuite/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs//bsuite.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bsuite_zh.html) |
+| 19 | [ImageNet](https://www.image-net.org/) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/image_classification)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/image_cls_zh.html) |
+| 20 | [slime_volleyball](https://github.com/hardmaru/slimevolleygym) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/slime_volley)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/slime_volleyball.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/slime_volleyball_zh.html) |
+| 21 | [gym_hybrid](https://github.com/thomashirtz/gym-hybrid) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_hybrid)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gym_hybrid.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_hybrid_zh.html) |
+| 22 | [GoBigger](https://github.com/opendilab/GoBigger) |  |  | [dizoo link](https://github.com/opendilab/GoBigger-Challenge-2021/tree/main/di_baseline)
[env tutorial](https://gobigger.readthedocs.io/en/latest/index.html)
[环境指南](https://gobigger.readthedocs.io/zh_CN/latest/) |
+| 23 | [gym_soccer](https://github.com/openai/gym-soccer) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_soccer)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_soccer_zh.html) |
+| 24 | [multiagent_mujoco](https://github.com/schroederdewitt/multiagent_mujoco) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/multiagent_mujoco/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/mujoco_zh.html) |
+| 25 | bitflip |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bitflip/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bitflip_zh.html) |
+| 26 | [sokoban](https://github.com/mpSchrader/gym-sokoban) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/sokoban/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/sokoban.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/sokoban_zh.html) |
+| 27 | [gym_anytrading](https://github.com/AminHP/gym-anytrading) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_anytrading)
[env tutorial](https://github.com/opendilab/DI-engine/blob/main/dizoo/gym_anytrading/envs/README.md) |
+| 28 | [mario](https://github.com/Kautenja/gym-super-mario-bros) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/mario)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gym_super_mario_bros.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_super_mario_bros_zh.html) |
+| 29 | [dmc2gym](https://github.com/denisyarats/dmc2gym) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/dmc2gym)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/dmc2gym.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/dmc2gym_zh.html) |
+| 30 | [evogym](https://github.com/EvolutionGym/evogym) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/evogym/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/evogym.html)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/Evogym_zh.html) |
+| 31 | [gym-pybullet-drones](https://github.com/utiasDSL/gym-pybullet-drones) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_pybullet_drones/envs)
环境指南 |
+| 32 | [beergame](https://github.com/OptMLGroup/DeepBeerInventory-RL) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/beergame/envs)
环境指南 |
+| 33 | [classic_control/acrobot](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/acrobot/envs)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/acrobot_zh.html) |
+| 34 | [box2d/car_racing](https://github.com/openai/gym/blob/master/gym/envs/box2d/car_racing.py) | 
 |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/carracing/envs)
环境指南 |
+| 35 | [metadrive](https://github.com/metadriverse/metadrive) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/metadrive/env)
[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/metadrive_zh.html) |
+| 36 | [cliffwalking](https://github.com/openai/gym/blob/master/gym/envs/toy_text/cliffwalking.py) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/cliffwalking/envs)
env tutorial
环境指南 |
+| 37 | [tabmwp](https://promptpg.github.io/explore.html) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/tabmwp)
env tutorial
环境指南 |
+| 38 | [frozen_lake](https://gymnasium.farama.org/environments/toy_text/frozen_lake) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/frozen_lake)
env tutorial
环境指南 |
+
 means discrete action space
@@ -329,111 +334,112 @@ P.S: The `.py` file in `Runnable Demo` can be found in `dizoo`
 means environment that allows agent VS agent battle
P.S. some enviroments in Atari, such as **MontezumaRevenge**, are also the sparse reward type.
-
+
### General Data Container: TreeTensor
DI-engine utilizes [TreeTensor](https://github.com/opendilab/DI-treetensor) as the basic data container in various components, which is ease of use and consistent across different code modules such as environment definition, data processing and DRL optimization. Here are some concrete code examples:
- TreeTensor can easily extend all the operations of `torch.Tensor` to nested data:
+
(Click for Details)
- ```python
- import treetensor.torch as ttorch
-
-
- # create random tensor
- data = ttorch.randn({'a': (3, 2), 'b': {'c': (3, )}})
- # clone+detach tensor
- data_clone = data.clone().detach()
- # access tree structure like attribute
- a = data.a
- c = data.b.c
- # stack/cat/split
- stacked_data = ttorch.stack([data, data_clone], 0)
- cat_data = ttorch.cat([data, data_clone], 0)
- data, data_clone = ttorch.split(stacked_data, 1)
- # reshape
- data = data.unsqueeze(-1)
- data = data.squeeze(-1)
- flatten_data = data.view(-1)
- # indexing
- data_0 = data[0]
- data_1to2 = data[1:2]
- # execute math calculations
- data = data.sin()
- data.b.c.cos_().clamp_(-1, 1)
- data += data ** 2
- # backward
- data.requires_grad_(True)
- loss = data.arctan().mean()
- loss.backward()
- # print shape
- print(data.shape)
- # result
- #
- # ├── 'a' --> torch.Size([1, 3, 2])
- # └── 'b' -->
- # └── 'c' --> torch.Size([1, 3])
- ```
+ ```python
+ import treetensor.torch as ttorch
+
+
+ # create random tensor
+ data = ttorch.randn({'a': (3, 2), 'b': {'c': (3, )}})
+ # clone+detach tensor
+ data_clone = data.clone().detach()
+ # access tree structure like attribute
+ a = data.a
+ c = data.b.c
+ # stack/cat/split
+ stacked_data = ttorch.stack([data, data_clone], 0)
+ cat_data = ttorch.cat([data, data_clone], 0)
+ data, data_clone = ttorch.split(stacked_data, 1)
+ # reshape
+ data = data.unsqueeze(-1)
+ data = data.squeeze(-1)
+ flatten_data = data.view(-1)
+ # indexing
+ data_0 = data[0]
+ data_1to2 = data[1:2]
+ # execute math calculations
+ data = data.sin()
+ data.b.c.cos_().clamp_(-1, 1)
+ data += data ** 2
+ # backward
+ data.requires_grad_(True)
+ loss = data.arctan().mean()
+ loss.backward()
+ # print shape
+ print(data.shape)
+ # result
+ #
+ # ├── 'a' --> torch.Size([1, 3, 2])
+ # └── 'b' -->
+ # └── 'c' --> torch.Size([1, 3])
+ ```
-
- TreeTensor can make it simple yet effective to implement classic deep reinforcement learning pipeline
+
(Click for Details)
- ```diff
- import torch
- import treetensor.torch as ttorch
-
- B = 4
-
-
- def get_item():
- return {
- 'obs': {
- 'scalar': torch.randn(12),
- 'image': torch.randn(3, 32, 32),
- },
- 'action': torch.randint(0, 10, size=(1,)),
- 'reward': torch.rand(1),
- 'done': False,
- }
-
-
- data = [get_item() for _ in range(B)]
-
-
- # execute `stack` op
- - def stack(data, dim):
- - elem = data[0]
- - if isinstance(elem, torch.Tensor):
- - return torch.stack(data, dim)
- - elif isinstance(elem, dict):
- - return {k: stack([item[k] for item in data], dim) for k in elem.keys()}
- - elif isinstance(elem, bool):
- - return torch.BoolTensor(data)
- - else:
- - raise TypeError("not support elem type: {}".format(type(elem)))
- - stacked_data = stack(data, dim=0)
- + data = [ttorch.tensor(d) for d in data]
- + stacked_data = ttorch.stack(data, dim=0)
-
- # validate
- - assert stacked_data['obs']['image'].shape == (B, 3, 32, 32)
- - assert stacked_data['action'].shape == (B, 1)
- - assert stacked_data['reward'].shape == (B, 1)
- - assert stacked_data['done'].shape == (B,)
- - assert stacked_data['done'].dtype == torch.bool
- + assert stacked_data.obs.image.shape == (B, 3, 32, 32)
- + assert stacked_data.action.shape == (B, 1)
- + assert stacked_data.reward.shape == (B, 1)
- + assert stacked_data.done.shape == (B,)
- + assert stacked_data.done.dtype == torch.bool
- ```
+ ```diff
+ import torch
+ import treetensor.torch as ttorch
+
+ B = 4
+
+
+ def get_item():
+ return {
+ 'obs': {
+ 'scalar': torch.randn(12),
+ 'image': torch.randn(3, 32, 32),
+ },
+ 'action': torch.randint(0, 10, size=(1,)),
+ 'reward': torch.rand(1),
+ 'done': False,
+ }
+
+
+ data = [get_item() for _ in range(B)]
+
+
+ # execute `stack` op
+ - def stack(data, dim):
+ - elem = data[0]
+ - if isinstance(elem, torch.Tensor):
+ - return torch.stack(data, dim)
+ - elif isinstance(elem, dict):
+ - return {k: stack([item[k] for item in data], dim) for k in elem.keys()}
+ - elif isinstance(elem, bool):
+ - return torch.BoolTensor(data)
+ - else:
+ - raise TypeError("not support elem type: {}".format(type(elem)))
+ - stacked_data = stack(data, dim=0)
+ + data = [ttorch.tensor(d) for d in data]
+ + stacked_data = ttorch.stack(data, dim=0)
+
+ # validate
+ - assert stacked_data['obs']['image'].shape == (B, 3, 32, 32)
+ - assert stacked_data['action'].shape == (B, 1)
+ - assert stacked_data['reward'].shape == (B, 1)
+ - assert stacked_data['done'].shape == (B,)
+ - assert stacked_data['done'].dtype == torch.bool
+ + assert stacked_data.obs.image.shape == (B, 3, 32, 32)
+ + assert stacked_data.action.shape == (B, 1)
+ + assert stacked_data.reward.shape == (B, 1)
+ + assert stacked_data.done.shape == (B,)
+ + assert stacked_data.done.dtype == torch.bool
+ ```
@@ -442,8 +448,8 @@ DI-engine utilizes [TreeTensor](https://github.com/opendilab/DI-treetensor) as t
- [File an issue](https://github.com/opendilab/DI-engine/issues/new/choose) on Github
- Open or participate in our [forum](https://github.com/opendilab/DI-engine/discussions)
- Discuss on DI-engine [slack communication channel](https://join.slack.com/t/opendilab/shared_invite/zt-v9tmv4fp-nUBAQEH1_Kuyu_q4plBssQ)
-- Discuss on DI-engine's WeChat group (i.e. add us on WeChat: ding314assist)
-
+- Discuss on DI-engine's WeChat group (i.e. add us on WeChat: ding314assist)
+
- Contact our email (opendilab@pjlab.org.cn)
- Contributes to our future plan [Roadmap](https://github.com/opendilab/DI-engine/issues/548)
@@ -460,8 +466,8 @@ We appreciate all the feedbacks and contributions to improve DI-engine, both alg
[](https://github.com/opendilab/DI-engine/network/members)
-
## Citation
+
```latex
@misc{ding,
title={DI-engine: OpenDILab Decision Intelligence Engine},
@@ -473,4 +479,5 @@ We appreciate all the feedbacks and contributions to improve DI-engine, both alg
```
## License
+
DI-engine released under the Apache 2.0 license.
diff --git a/ding/example/dqn_frozen_lake.py b/ding/example/dqn_frozen_lake.py
new file mode 100644
index 0000000000..ec4b856339
--- /dev/null
+++ b/ding/example/dqn_frozen_lake.py
@@ -0,0 +1,45 @@
+from ditk import logging
+from ding.model import DQN
+from ding.policy import DQNPolicy
+from ding.envs import DingEnvWrapper, BaseEnvManagerV2
+from ding.data import DequeBuffer
+from ding.config import compile_config
+from ding.framework import task
+from ding.framework.context import OnlineRLContext
+from ding.framework.middleware import OffPolicyLearner, StepCollector, interaction_evaluator, data_pusher, \
+ eps_greedy_handler, CkptSaver, nstep_reward_enhancer, final_ctx_saver
+from ding.utils import set_pkg_seed
+from dizoo.frozen_lake.config.frozen_lake_dqn_config import main_config, create_config
+from dizoo.frozen_lake.envs import FrozenLakeEnv
+
+
+def main():
+ logging.getLogger().setLevel(logging.INFO)
+ main_config.policy.nstep = 5
+ cfg = compile_config(main_config, create_cfg=create_config, auto=True)
+ with task.start(async_mode=False, ctx=OnlineRLContext()):
+ collector_env = BaseEnvManagerV2(
+ env_fn=[lambda: FrozenLakeEnv(cfg=cfg.env) for _ in range(cfg.env.collector_env_num)], cfg=cfg.env.manager
+ )
+ evaluator_env = BaseEnvManagerV2(
+ env_fn=[lambda: FrozenLakeEnv(cfg=cfg.env) for _ in range(cfg.env.evaluator_env_num)], cfg=cfg.env.manager
+ )
+ set_pkg_seed(cfg.seed, use_cuda=cfg.policy.cuda)
+
+ model = DQN(**cfg.policy.model)
+ buffer_ = DequeBuffer(size=cfg.policy.other.replay_buffer.replay_buffer_size)
+ policy = DQNPolicy(cfg.policy, model=model)
+
+ task.use(interaction_evaluator(cfg, policy.eval_mode, evaluator_env))
+ task.use(eps_greedy_handler(cfg))
+ task.use(StepCollector(cfg, policy.collect_mode, collector_env))
+ task.use(nstep_reward_enhancer(cfg))
+ task.use(data_pusher(cfg, buffer_))
+ task.use(OffPolicyLearner(cfg, policy.learn_mode, buffer_))
+ task.use(CkptSaver(policy, cfg.exp_name, train_freq=100))
+ task.use(final_ctx_saver(cfg.exp_name))
+ task.run()
+
+
+if __name__ == "__main__":
+ main()
diff --git a/dizoo/frozen_lake/FrozenLake.gif b/dizoo/frozen_lake/FrozenLake.gif
new file mode 100644
index 0000000000..db46a98e39
Binary files /dev/null and b/dizoo/frozen_lake/FrozenLake.gif differ
diff --git a/dizoo/frozen_lake/__init__.py b/dizoo/frozen_lake/__init__.py
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/dizoo/frozen_lake/config/__init__.py b/dizoo/frozen_lake/config/__init__.py
new file mode 100644
index 0000000000..9bec16a088
--- /dev/null
+++ b/dizoo/frozen_lake/config/__init__.py
@@ -0,0 +1 @@
+from .frozen_lake_dqn_config import main_config, create_config
diff --git a/dizoo/frozen_lake/config/frozen_lake_dqn_config.py b/dizoo/frozen_lake/config/frozen_lake_dqn_config.py
new file mode 100644
index 0000000000..84fe0de199
--- /dev/null
+++ b/dizoo/frozen_lake/config/frozen_lake_dqn_config.py
@@ -0,0 +1,64 @@
+from easydict import EasyDict
+
+frozen_lake_dqn_config = dict(
+ exp_name='frozen_lake_seed0',
+ env=dict(
+ collector_env_num=8,
+ evaluator_env_num=5,
+ n_evaluator_episode=10,
+ env_id='FrozenLake-v1',
+ desc=None,
+ map_name="4x4",
+ is_slippery=False,
+ save_replay_gif=False,
+ ),
+ policy=dict(
+ cuda=True,
+ load_path='frozen_lake_seed0/ckpt/ckpt_best.pth.tar',
+ model=dict(
+ obs_shape=16,
+ action_shape=4,
+ encoder_hidden_size_list=[128, 128, 64],
+ dueling=True,
+ ),
+ nstep=3,
+ discount_factor=0.97,
+ learn=dict(
+ update_per_collect=5,
+ batch_size=256,
+ learning_rate=0.001,
+ ),
+ collect=dict(n_sample=10),
+ eval=dict(evaluator=dict(eval_freq=40, )),
+ other=dict(
+ eps=dict(
+ type='exp',
+ start=0.8,
+ end=0.1,
+ decay=10000,
+ ),
+ replay_buffer=dict(replay_buffer_size=20000, ),
+ ),
+ ),
+)
+
+frozen_lake_dqn_config = EasyDict(frozen_lake_dqn_config)
+main_config = frozen_lake_dqn_config
+
+frozen_lake_dqn_create_config = dict(
+ env=dict(
+ type='frozen_lake',
+ import_names=['dizoo.frozen_lake.envs.frozen_lake_env'],
+ ),
+ env_manager=dict(type='base'),
+ policy=dict(type='dqn'),
+ replay_buffer=dict(type='deque', import_names=['ding.data.buffer.deque_buffer_wrapper']),
+)
+
+frozen_lake_dqn_create_config = EasyDict(frozen_lake_dqn_create_config)
+create_config = frozen_lake_dqn_create_config
+
+if __name__ == "__main__":
+ # or you can enter `ding -m serial -c frozen_lake_dqn_config.py -s 0`
+ from ding.entry import serial_pipeline
+ serial_pipeline((main_config, create_config), max_env_step=5000, seed=0)
diff --git a/dizoo/frozen_lake/envs/__init__.py b/dizoo/frozen_lake/envs/__init__.py
new file mode 100644
index 0000000000..dfec345139
--- /dev/null
+++ b/dizoo/frozen_lake/envs/__init__.py
@@ -0,0 +1 @@
+from .frozen_lake_env import FrozenLakeEnv
diff --git a/dizoo/frozen_lake/envs/frozen_lake_env.py b/dizoo/frozen_lake/envs/frozen_lake_env.py
new file mode 100644
index 0000000000..72f179077a
--- /dev/null
+++ b/dizoo/frozen_lake/envs/frozen_lake_env.py
@@ -0,0 +1,144 @@
+from typing import Any, Dict, List, Optional
+import imageio
+import os
+import gymnasium as gymn
+import numpy as np
+from ding.envs import BaseEnv, BaseEnvTimestep
+from ding.torch_utils import to_ndarray
+from ding.utils import ENV_REGISTRY
+
+
+@ENV_REGISTRY.register('frozen_lake')
+class FrozenLakeEnv(BaseEnv):
+
+ def __init__(self, cfg) -> None:
+ self._cfg = cfg
+ assert self._cfg.env_id == "FrozenLake-v1", "yout name is not FrozernLake_v1"
+ self._init_flag = False
+ self._save_replay_bool = False
+ self._save_replay_count = 0
+ self._init_flag = False
+ self._frames = []
+ self._replay_path = False
+
+ def reset(self) -> np.ndarray:
+ if not self._init_flag:
+ if not self._cfg.desc: #specify maps non-preloaded maps
+ self._env = gymn.make(
+ self._cfg.env_id,
+ desc=self._cfg.desc,
+ map_name=self._cfg.map_name,
+ is_slippery=self._cfg.is_slippery,
+ render_mode="rgb_array"
+ )
+ self._observation_space = self._env.observation_space
+ self._action_space = self._env.action_space
+ self._reward_space = gymn.spaces.Box(
+ low=self._env.reward_range[0], high=self._env.reward_range[1], shape=(1, ), dtype=np.float32
+ )
+ self._init_flag = True
+ self._eval_episode_return = 0
+ if hasattr(self, '_seed') and hasattr(self, '_dynamic_seed') and self._dynamic_seed:
+ np_seed = 100 * np.random.randint(1, 1000)
+ self._env_seed = self._seed + np_seed
+ elif hasattr(self, '_seed'):
+ self._env_seed = self._seed
+ if hasattr(self, '_seed'):
+ obs, info = self._env.reset(seed=self._env_seed)
+ else:
+ obs, info = self._env.reset()
+ obs = np.eye(16, dtype=np.float32)[obs - 1]
+ return obs
+
+ def close(self) -> None:
+ if self._init_flag:
+ self._env.close()
+ self._init_flag = False
+
+ def seed(self, seed: int, dynamic_seed: bool = True) -> None:
+ self._seed = seed
+ self._dynamic_seed = dynamic_seed
+ np.random.seed(self._seed)
+
+ def step(self, action: Dict) -> BaseEnvTimestep:
+ obs, rew, terminated, truncated, info = self._env.step(action[0])
+ self._eval_episode_return += rew
+ obs = np.eye(16, dtype=np.float32)[obs - 1]
+ rew = to_ndarray([rew])
+ if self._save_replay_bool:
+ picture = self._env.render()
+ self._frames.append(picture)
+ if terminated or truncated:
+ done = True
+ else:
+ done = False
+ if done:
+ info['eval_episode_return'] = self._eval_episode_return
+ if self._save_replay_bool:
+ assert self._replay_path is not None, "your should have a path"
+ path = os.path.join(
+ self._replay_path, '{}_episode_{}.gif'.format(self._cfg.env_id, self._save_replay_count)
+ )
+ self.frames_to_gif(self._frames, path)
+ self._frames = []
+ self._save_replay_count += 1
+ rew = rew.astype(np.float32)
+ return BaseEnvTimestep(obs, rew, done, info)
+
+ def random_action(self) -> Dict:
+ raw_action = self._env.action_space.sample()
+ my_type = type(self._env.action_space)
+ return [raw_action]
+
+ def __repr__(self) -> str:
+ return "DI-engine Frozen Lake Env"
+
+ @property
+ def observation_space(self) -> gymn.spaces.Space:
+ return self._observation_space
+
+ @property
+ def action_space(self) -> gymn.spaces.Space:
+ return self._action_space
+
+ @property
+ def reward_space(self) -> gymn.spaces.Space:
+ return self._reward_space
+
+ def enable_save_replay(self, replay_path: Optional[str] = None) -> None:
+ if replay_path is None:
+ replay_path = './video'
+ self._replay_path = replay_path
+ self._save_replay_bool = True
+ self._save_replay_count = 0
+ self._frames = []
+
+ @staticmethod
+ def frames_to_gif(frames: List[imageio.core.util.Array], gif_path: str, duration: float = 0.1) -> None:
+ """
+ Convert a list of frames into a GIF.
+ Args:
+ - frames (List[imageio.core.util.Array]): A list of frames, each frame is an image.
+ - gif_path (str): The path to save the GIF file.
+ - duration (float): Duration between each frame in the GIF (seconds).
+
+ Returns:
+ None, the GIF file is saved directly to the specified path.
+ """
+ # Save all frames as temporary image files
+ temp_image_files = []
+ for i, frame in enumerate(frames):
+ temp_image_file = f"frame_{i}.png" # Temporary file name
+ imageio.imwrite(temp_image_file, frame) # Save the frame as a PNG file
+ temp_image_files.append(temp_image_file)
+
+ # Use imageio to convert temporary image files to GIF
+ with imageio.get_writer(gif_path, mode='I', duration=duration) as writer:
+ for temp_image_file in temp_image_files:
+ image = imageio.imread(temp_image_file)
+ writer.append_data(image)
+
+ # Clean up temporary image files
+ for temp_image_file in temp_image_files:
+ os.remove(temp_image_file)
+ print(f"GIF saved as {gif_path}")
diff --git a/dizoo/frozen_lake/envs/test_frozen_lake_env.py b/dizoo/frozen_lake/envs/test_frozen_lake_env.py
new file mode 100644
index 0000000000..c313a264e0
--- /dev/null
+++ b/dizoo/frozen_lake/envs/test_frozen_lake_env.py
@@ -0,0 +1,44 @@
+import numpy as np
+import pytest
+from dizoo.frozen_lake.envs import FrozenLakeEnv
+from easydict import EasyDict
+
+
+@pytest.mark.envtest
+class TestGymHybridEnv:
+
+ def test_my_lake(self):
+ env = FrozenLakeEnv(
+ EasyDict({
+ 'env_id': 'FrozenLake-v1',
+ 'desc': None,
+ 'map_name': "4x4",
+ 'is_slippery': False,
+ })
+ )
+ for _ in range(5):
+ env.seed(314, dynamic_seed=False)
+ assert env._seed == 314
+ obs = env.reset()
+ assert obs.shape == (
+ 16,
+ ), "Considering the one-hot encoding format, your observation should have a dimensionality of 16."
+ for i in range(10):
+ env.enable_save_replay("./video")
+ # Both ``env.random_action()``, and utilizing ``np.random`` as well as action space,
+ # can generate legal random action.
+ if i < 5:
+ random_action = np.array([env.action_space.sample()])
+ else:
+ random_action = env.random_action()
+ timestep = env.step(random_action)
+ print(timestep)
+ assert isinstance(timestep.obs, np.ndarray)
+ assert isinstance(timestep.done, bool)
+ assert timestep.obs.shape == (16, )
+ assert timestep.reward.shape == (1, )
+ assert timestep.reward >= env.reward_space.low
+ assert timestep.reward <= env.reward_space.high
+
+ print(env.observation_space, env.action_space, env.reward_space)
+ env.close()