Merge branch 'opendilab:main' into q_transformner

opendilab · Jul 1, 2024 · 140b70f · 140b70f
2 parents 0b54465 + b4ab08a
commit 140b70f
Show file tree

Hide file tree

Showing 9 changed files with 82 additions and 43 deletions.
diff --git a/CHANGELOG b/CHANGELOG
@@ -1,3 +1,25 @@
+2024.06.27(v0.5.2)
+- env: add taxi env (#799) (#807) 
+- env: add ising model env (#782)
+- env: add new Flozen Lake env (#781)
+- env: optimize ppo continuous config in MuJoCo (#801)
+- env: fix masac smac config multi_agent=True bug (#791)
+- env: update/speed up pendulum ppo
+- algo: fix gtrxl compatibility bug (#796)
+- algo: fix complex obs demo for ppo pipeline (#786)
+- algo: add naive PWIL demo
+- algo: fix marl nstep td compatibility bug
+- feature: add GPU utils (#788)
+- feature: add deprecated function decorator (#778)
+- style: relax flask requirement (#811)
+- style: add new badge (hellogithub) in readme (#805)
+- style: update discord link and badge in readme (#795)
+- style: fix typo in config.py (#776)
+- style: polish rl_utils api docs
+- style: add constraint about numpy<2
+- style: polish macos platform test version to 12
+- style: polish ci python version
+
 2024.02.04(v0.5.1)
 - env: add MADDPG pettingzoo example (#774)
 - env: polish NGU Atari configs (#767)

diff --git a/README.md b/README.md
@@ -42,7 +42,7 @@
 </div>
 <br>
 
-Updated on 2024.02.04 DI-engine-v0.5.1
+Updated on 2024.06.27 DI-engine-v0.5.2
 
 ## Introduction to DI-engine
 
@@ -56,10 +56,13 @@ It provides **python-first** and **asynchronous-native** task and middleware abs
 - Multi-agent RL algorithms: such as QMIX, WQMIX, MAPPO, HAPPO, ACE
 - Imitation learning algorithms (BC/IRL/GAIL): such as GAIL, SQIL, Guided Cost Learning, Implicit BC
 - Offline RL algorithms: BCQ, CQL, TD3BC, Decision Transformer, EDAC, Diffuser, Decision Diffuser, SO2
-- Model-based RL algorithms: SVG, STEVE, MBPO, DDPPO, DreamerV3, MuZero
+- Model-based RL algorithms: SVG, STEVE, MBPO, DDPPO, DreamerV3
 - Exploration algorithms: HER, RND, ICM, NGU
-- LLM + RL Algorithms: PPO-max, DPO, MODPO，PromptPG
+- LLM + RL Algorithms: PPO-max, DPO, PromptPG
 - Other algorithms: such as PER, PLR, PCGrad
+- MCTS + RL algorithms: AlphaZero, MuZero, please refer to [LightZero](https://github.com/opendilab/LightZero)
+- Generative Model + RL algorithms: Diffusion-QL, QGPO, SRPO, please refer to [GenerativeRL](https://github.com/opendilab/GenerativeRL)
+
 
 **DI-engine** aims to **standardize different Decision Intelligence environments and applications**, supporting both academic research and prototype applications. Various training pipelines and customized decision AI applications are also supported:
 
@@ -72,6 +75,7 @@ It provides **python-first** and **asynchronous-native** task and middleware abs
   - [PPOxFamily](https://github.com/opendilab/PPOxFamily): PPO x Family DRL Tutorial Course
 - Real world decision AI applications
   - [DI-star](https://github.com/opendilab/DI-star): Decision AI in StarCraftII
+  - [PsyDI](https://github.com/opendilab/PsyDI): Towards a Multi-Modal and Interactive Chatbot for Psychological Assessments
   - [DI-drive](https://github.com/opendilab/DI-drive): Auto-driving platform
   - [DI-sheep](https://github.com/opendilab/DI-sheep): Decision AI in 3 Tiles Game
   - [DI-smartcross](https://github.com/opendilab/DI-smartcross): Decision AI in Traffic Light Control
@@ -84,16 +88,20 @@ It provides **python-first** and **asynchronous-native** task and middleware abs
   - [DOS](https://github.com/opendilab/DOS): [CVPR 2023] ReasonNet: End-to-End Driving with Temporal and Global Reasoning
   - [LightZero](https://github.com/opendilab/LightZero): [NeurIPS 2023 Spotlight] A lightweight and efficient MCTS/AlphaZero/MuZero algorithm toolkit
   - [SO2](https://github.com/opendilab/SO2): [AAAI 2024] A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
-  - [LMDrive](https://github.com/opendilab/LMDrive): LMDrive: Closed-Loop End-to-End Driving with Large Language Models
+  - [LMDrive](https://github.com/opendilab/LMDrive): [CVPR 2024] LMDrive: Closed-Loop End-to-End Driving with Large Language Models
+  - [SmartRefine](https://github.com/opendilab/SmartRefine): [CVPR 2024] SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
+  - [ReZero](https://github.com/opendilab/LightZero): Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
+  - [UniZero](https://github.com/opendilab/LightZero): Generalized and Efficient Planning with Scalable Latent World Models
 - Docs and Tutorials
   - [DI-engine-docs](https://github.com/opendilab/DI-engine-docs): Tutorials, best practice and the API reference.
   - [awesome-model-based-RL](https://github.com/opendilab/awesome-model-based-RL): A curated list of awesome Model-Based RL resources
   - [awesome-exploration-RL](https://github.com/opendilab/awesome-exploration-rl): A curated list of awesome exploration RL resources
   - [awesome-decision-transformer](https://github.com/opendilab/awesome-decision-transformer): A curated list of Decision Transformer resources
   - [awesome-RLHF](https://github.com/opendilab/awesome-RLHF): A curated list of reinforcement learning with human feedback resources
   - [awesome-multi-modal-reinforcement-learning](https://github.com/opendilab/awesome-multi-modal-reinforcement-learning): A curated list of Multi-Modal Reinforcement Learning resources
-  - [awesome-AI-based-protein-design](https://github.com/opendilab/awesome-AI-based-protein-design): a collection of research papers for AI-based protein design
   - [awesome-diffusion-model-in-rl](https://github.com/opendilab/awesome-diffusion-model-in-rl): A curated list of Diffusion Model in RL resources
+  - [awesome-ui-agents](https://github.com/opendilab/awesome-ui-agents): A curated list of of awesome UI agents resources, encompassing Web, App, OS, and beyond
+  - [awesome-AI-based-protein-design](https://github.com/opendilab/awesome-AI-based-protein-design): a collection of research papers for AI-based protein design
   - [awesome-end-to-end-autonomous-driving](https://github.com/opendilab/awesome-end-to-end-autonomous-driving): A curated list of awesome End-to-End Autonomous Driving resources
   - [awesome-driving-behavior-prediction](https://github.com/opendilab/awesome-driving-behavior-prediction): A collection of research papers for Driving Behavior Prediction
 
@@ -324,7 +332,7 @@ P.S: The `.py` file in `Runnable Demo` can be found in `dizoo`
 | 37 |                       [tabmwp](https://promptpg.github.io/explore.html)                       |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                ![original](./dizoo/tabmwp/tabmwp.jpeg)                                |                                                                                         [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/tabmwp) <br> env tutorial <br> 环境指南                                                                                         |
 | 38 |            [frozen_lake](https://gymnasium.farama.org/environments/toy_text/frozen_lake)      |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                ![original](./dizoo/frozen_lake/FrozenLake.gif)                        |                                                                                         [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/frozen_lake) <br> env tutorial <br> 环境指南                                                                                         |
 | 39 | [ising_model](https://github.com/mlii/mfrl/tree/master/examples/ising_model)                  |                            ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![marl](https://img.shields.io/badge/-MARL-yellow)                                                                                             |                                ![original](./dizoo/ising_env/ising_env.gif)                           | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/ising_env) <br> env tutorial <br> [环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/ising_model_zh.html) |
-| 40 | [taxi](https://www.gymlibrary.dev/environments/toy_text/taxi/)                  |                            ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                 |                                ![original](./dizoo/taxi/Taxi-v3_episode_0.gif)                           | dizoo link <br> env tutorial <br> 环境指南 |
+| 40 | [taxi](https://www.gymlibrary.dev/environments/toy_text/taxi/)                  |                            ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                 |                                ![original](./dizoo/taxi/Taxi-v3_episode_0.gif)                           | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/taxi/envs) <br> [env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/taxi.html) <br> [环境指南](https://di-engine-docs.readthedocs.io/zh-cn/latest/13_envs/taxi_zh.html) |
 
 
 
@@ -482,8 +490,8 @@ We appreciate all the feedbacks and contributions to improve DI-engine, both alg
 
 ```latex
 @misc{ding,
-    title={DI-engine: OpenDILab Decision Intelligence Engine},
-    author={OpenDILab Contributors},
+    title={DI-engine: A Universal AI System/Engine for Decision Intelligence},
+    author={Yazhe Niu, Jingxin Xu, Yuan Pu, Yunpeng Nie, Jinouwen Zhang, Shuai Hu, Liangxuan Zhao, Ming Zhang, Yu Liu},
     publisher={GitHub},
     howpublished={\url{https://github.com/opendilab/DI-engine}},
     year={2021},

diff --git a/conda/meta.yaml b/conda/meta.yaml
@@ -1,7 +1,7 @@
 {% set data = load_setup_py_data() %}
 package:
   name: di-engine
-  version: v0.5.1
+  version: v0.5.2
 
 source:
   path: ..

diff --git a/ding/__init__.py b/ding/__init__.py
@@ -1,7 +1,7 @@
 import os
 
 __TITLE__ = 'DI-engine'
-__VERSION__ = 'v0.5.1'
+__VERSION__ = 'v0.5.2'
 __DESCRIPTION__ = 'Decision AI Engine'
 __AUTHOR__ = "OpenDILab Contributors"
 __AUTHOR_EMAIL__ = "[email protected]"

diff --git a/ding/utils/memory_helper.py b/ding/utils/memory_helper.py
@@ -8,10 +8,9 @@
 try:
     import pyecharts
 except ImportError:
-    import sys
     import logging
     logging.error("Please install pyecharts first, you can install it by running 'pip install pyecharts'")
-    sys.exit(1)
+    pyecharts = None
 
 MegaByte = 1024 * 1024
 

diff --git a/dizoo/taxi/config/taxi_dqn_config.py b/dizoo/taxi/config/taxi_dqn_config.py
@@ -1,39 +1,45 @@
 from easydict import EasyDict
 
 taxi_dqn_config = dict(
-    exp_name='taxi_seed0',
+    exp_name='taxi_dqn_seed0',
     env=dict(
         collector_env_num=8,
-        evaluator_env_num=8,  
-        n_evaluator_episode=10,
-        max_episode_steps=300,
-        env_id="Taxi-v3"
+        evaluator_env_num=8,
+        n_evaluator_episode=8,   
+        stop_value=20,           
+        max_episode_steps=60,    
+        env_id="Taxi-v3" 
     ),
     policy=dict(
         cuda=True,
-        load_path="./taxi_dqn_seed0/ckpt/ckpt_best.pth.tar",
         model=dict(
-            obs_shape=4,    
+            obs_shape=34,
             action_shape=6,
-            encoder_hidden_size_list=[256, 128, 64]
+            encoder_hidden_size_list=[128, 128]
         ),
+        random_collect_size=5000,
         nstep=3,
-        discount_factor=0.98,
+        discount_factor=0.99,
         learn=dict(
-            update_per_collect=5,
-            batch_size=128,
-            learning_rate=0.001,
+            update_per_collect=10,
+            batch_size=64,
+            learning_rate=0.0001,
+            learner=dict(
+                hook=dict(
+                    log_show_after_iter=1000,
+                )
+            ),
         ),
-        collect=dict(n_sample=10),
-        eval=dict(evaluator=dict(eval_freq=5, )),
+        collect=dict(n_sample=32),
+        eval=dict(evaluator=dict(eval_freq=1000, )), 
         other=dict(
             eps=dict(
               type="linear",
-              start=0.8,
-              end=0.1,
-              decay=10000  
-            ),
-            replay_buffer=dict(replay_buffer_size=20000,),
+              start=1,
+              end=0.05,
+              decay=3000000                             
+            ),                                      
+            replay_buffer=dict(replay_buffer_size=100000,),  
         ),
     )
 )
@@ -55,4 +61,4 @@
 
 if __name__ == "__main__":
     from ding.entry import serial_pipeline
-    serial_pipeline((main_config, create_config), max_env_step=5000, seed=0)
+    serial_pipeline((main_config, create_config), max_env_step=3000000, seed=0)
diff --git a/dizoo/taxi/envs/taxi_env.py b/dizoo/taxi/envs/taxi_env.py
@@ -93,8 +93,8 @@ def step(self, action: np.ndarray) -> BaseEnvTimestep:
     def enable_save_replay(self, replay_path: Optional[str] = None) -> None:
         if replay_path is None:
             replay_path = './video'
-            if not os.path.exists(replay_path):
-                os.makedirs(replay_path)
+        if not os.path.exists(replay_path):
+            os.makedirs(replay_path)
         self._replay_path = replay_path
         self._save_replay = True
         self._save_replay_count = 0
@@ -118,7 +118,11 @@ def random_action(self) -> np.ndarray:
     #todo encode the state into a vector    
     def _encode_taxi(self, obs: np.ndarray) -> np.ndarray:
         taxi_row, taxi_col, passenger_location, destination = self._env.unwrapped.decode(obs)
-        return to_ndarray([taxi_row, taxi_col, passenger_location, destination])
+        encoded_obs = np.zeros(34)
+        encoded_obs[5 * taxi_row + taxi_col] = 1
+        encoded_obs[25 + passenger_location] = 1
+        encoded_obs[30 + destination] = 1
+        return to_ndarray(encoded_obs)
 
     @property
     def observation_space(self) -> Space:

diff --git a/dizoo/taxi/envs/test_taxi_env.py b/dizoo/taxi/envs/test_taxi_env.py
@@ -16,7 +16,7 @@ def test_naive(self):
         env.seed(314, dynamic_seed=False)
         assert env._seed == 314
         obs = env.reset()
-        assert obs.shape == (4, )
+        assert obs.shape == (34, )
         for _ in range(5):
             env.reset()
             np.random.seed(314)
@@ -32,7 +32,7 @@ def test_naive(self):
                 print(f"Your timestep in wrapped mode is: {timestep}")
                 assert isinstance(timestep.obs, np.ndarray)
                 assert isinstance(timestep.done, bool)
-                assert timestep.obs.shape == (4, )
+                assert timestep.obs.shape == (34, )
                 assert timestep.reward.shape == (1, )
                 assert timestep.reward >= env.reward_space.low
                 assert timestep.reward <= env.reward_space.high

diff --git a/setup.py b/setup.py
@@ -55,7 +55,7 @@
         'gym==0.25.1',  # pypy incompatible; some environments only support gym==0.22.0
         'gymnasium',
         'torch>=1.1.0',
-        'numpy>=1.18.0',
+        'numpy>=1.18.0,<2',
         'DI-treetensor>=0.4.0',
         'DI-toolkit>=0.1.0',
         'trueskill',
@@ -69,11 +69,11 @@
         'hickle',
         'tabulate',
         'click>=7.0.0',
-        'requests>=2.25.1',  # interaction
-        'flask~=1.1.2',  # interaction
-        'responses~=0.12.1',  # interaction
-        'URLObject>=2.4.0',  # interaction
-        'MarkupSafe==2.0.1',  # interaction, compatibility
+        'flask<=2.0.3',  # interaction
+        'werkzeug<=2.0.3',  # interaction
+        'requests',  # interaction
+        'responses',  # interaction
+        'URLObject',  # interaction
         'pynng',  # parallel
         'sniffio', # parallel
         'redis',  # parallel