Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(rjy): add mamujoco env and related configs #153

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

nighood
Copy link
Collaborator

@nighood nighood commented Nov 28, 2023

  1. MAmujoco Environment Integration: I have added support for the MAmujoco environment and successfully adapted it for use with LightZero. For detailed information about the MAmujoco environment, please refer to the original repository at MaMuJoCo Environments.

  2. Independent Learning Pipeline: A new independent learning pipeline has been introduced to the project. This pipeline is currently integrated with the existing codebase and can be activated by setting the 'multi_agent' parameter accordingly.

These updates aim to enhance the project's functionality and scalability, providing a robust framework for multi-agent learning scenarios.

@puyuan1996 puyuan1996 added environment New or improved environment config New or improved configuration labels Nov 28, 2023
@puyuan1996
Copy link
Collaborator

以 independent learning 的形式接一下sampled efficientzero算法,验证环境的逻辑

"""
Overview:
The modified Multi-agentMuJoCo environment with continuous action space for LightZero's algorithms.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/opendilab/LightZero/blob/main/zoo/box2d/lunarlander/envs/lunarlander_env.py 类似这里增加详细清晰的注释,可以参考https://aicarrier.feishu.cn/wiki/N4bqwLRO5iyQcAkb4HCcflbgnpR 这里的提示词用gpt4优化,然后手动矫正。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR的description里面增加这个PR的简要描述

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

类似这里https://github.com/opendilab/LightZero/blob/main/zoo/box2d/lunarlander/envs/lunarlander_env.py#L30增加存储MP4和gif回复的功能。

原来DI-engine这里似乎还没replay,等把其他改完我再测试一下

@puyuan1996 puyuan1996 added the research Research work in progress label Dec 12, 2023
@puyuan1996 puyuan1996 changed the title WIP: env(rjy): add mamujoco for LightZero feature(rjy): add mamujoco env and related configs Apr 8, 2024
# split a full batch into slices of mini_infer_size: to save the GPU memory for more GPU actors
slices = int(np.ceil(transition_batch_size / self._cfg.mini_infer_size))
network_output = []
for i in range(slices):
beg_index = self._cfg.mini_infer_size * i
end_index = self._cfg.mini_infer_size * (i + 1)
m_obs = torch.from_numpy(value_obs_list[beg_index:end_index]).to(self._cfg.device).float()
m_obs = to_dtype(to_device(to_tensor(value_obs_list[beg_index:end_index]), self._cfg.device), torch.float)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么要这样修改呢?之前的方法在多智能体下面会有报错吗?你现在的写法是在单/多智能体下都能与预期一致吗


# calculate the target value
m_obs = default_collate(m_obs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

类似上面的问题

target_values.append(0)
target_value_prefixs.append(value_prefix)
target_values.append(np.zeros_like(value_list[0]))
target_value_prefixs.append(np.array([0,]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单/多智能体运行都是正常的吗?测试一下mamujoco hopper和lunarlander-cont

pad_frames = np.array([stacked_obs[-1] for _ in range(pad_len)])
stacked_obs = np.concatenate((stacked_obs, pad_frames))
pad_frames = [stacked_obs[-1] for _ in range(pad_len)]
stacked_obs += pad_frames
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单/多智能体运行都是正常的吗?测试一下mamujoco hopper和lunarlander-cont

@@ -0,0 +1,540 @@
from typing import Optional, Tuple
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

从SampledEfficientZeroModelMLP继承,只改写不同的method,增加overview 阐述具体的不同

lzero/policy/scaling_transform.py Outdated Show resolved Hide resolved
@@ -388,8 +398,12 @@ def collect(self,
ready_env_id = ready_env_id.union(set(list(new_available_env_id)[:remain_episode]))
remain_episode -= min(len(new_available_env_id), remain_episode)

stack_obs = {env_id: game_segments[env_id].get_obs() for env_id in ready_env_id}
stack_obs = {env_id: game_segments[env_id].get_obs()[0] for env_id in ready_env_id}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确认单/多智能体是否兼容

if __name__ == "__main__":
from zoo.multiagent_mujoco.entry import train_sez_independent_mamujoco

train_sez_independent_mamujoco([main_config, create_config], seed=seed, max_env_step=max_env_step)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前 mamujoco 上的实验状态是?写在description里面吧,以及相对单agent的核心算法overview

) -> 'Policy': # noqa
"""
Overview:
The train entry for MCTS+RL algorithms, including MuZero, EfficientZero, Sampled EfficientZero, Gumbel Muzero.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

更新overview,阐述清楚主要的改动代码

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个与原有的 train_muzero的主要区别是?如果区别不大,尽量复用原有的代码哈

@@ -0,0 +1,132 @@
from easydict import EasyDict
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '6'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

优化config,去掉不通用的部分

@POLICY_REGISTRY.register('sampled_efficientzero')
class SampledEfficientZeroPolicy(MuZeroPolicy):
@POLICY_REGISTRY.register('sampled_efficientzero_ma')
class SampledEfficientZeroMAPolicy(SampledEfficientZeroPolicy):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件应该保持和原来的一致哈

class SampledEfficientZeroMAPolicy(SampledEfficientZeroPolicy):
"""
Overview:
The policy class for Sampled EfficientZero proposed in the paper https://arxiv.org/abs/2104.06303.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

更新注释,只重写需要修改的method哈,大部分应该是不用重写的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
config New or improved configuration environment New or improved environment research Research work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants