Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(yzj): add ptz ctde pipeline #149

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

jayyoung0802
Copy link
Collaborator

No description provided.


next_latent_state, reward = self.dynamics_network(state_action_encoding)
agent_state_action_encoding = torch.cat((agent_latent_state, action_encoding), dim=1)
global_state_action_encoding = torch.cat((agent_latent_state, global_latent_state, action_encoding), dim=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • global_state_action_encoding 有必要把agent_latent_state也拼接进去吗?
  • 拼接进去后,action_encoding只占了5/(256*2+5),信息密度是否过低呢

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个需要测试一下

Copy link
Collaborator

@puyuan1996 puyuan1996 Nov 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • s‘,s1‘, s2’, s3', r =(s,s1,s2,s3,a1,a2,a3) 用一个网络建模联合dynamic function,需要同时考虑team中每个agent的信息。
  • collect按照team存储data。
  • foward_learn中需要更改数据处理流程。unroll 5步,是整个team同时roll 5步。
  • foward_learn 中reward的处理。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global_state_dynamic的输入只有一个agent action, 没有joint action,是不合理的

policy_logits = policy_logits.detach().cpu().numpy().tolist()

legal_actions = [[i for i, x in enumerate(action_mask[j]) if x == 1] for j in range(active_collect_env_num)]
reward_roots = [[reward_root]*self.cfg.model.agent_num for reward_root in reward_roots]
Copy link
Collaborator

@puyuan1996 puyuan1996 Nov 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里reward_roots就是一个长度为24的list,为什么要按照这里的方式变换呢?24=8*3,按理讲,应该每3个对应的reward都是同一个team_reward才对?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方就是变成每3个智能体用同一个reward去搜索

@puyuan1996 puyuan1996 added environment New or improved environment algorithm New algorithm discussion Discussion of a typical issue or concept labels Nov 26, 2023
@@ -0,0 +1,116 @@
from easydict import EasyDict
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ea91eba7-8c7a-4536-924f-ebcc5218a223
mz simple_spread有这个报错,你们那边是正常运行的吗?pettingzoo是1.22.3版吗

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
algorithm New algorithm discussion Discussion of a typical issue or concept environment New or improved environment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants