Skip to content

question about action encoded #222

@Nightbringers

Description

@Nightbringers

Search before asking

  • I have searched the MuZero issues and found no similar bug report.

🐛 Describe the bug

I'm confuse about this one: In muzero paper, the input to the dynamics function is the hidden state concatenated with a representation of the action for the transition. The problem is the code is different from paper describe which i understand, a normal action (playing a stone on the board) is encoded as an all zero plane, with a single one in the position of the played stone. For example, if action_space_size = 5, action =2, an action could encoded [0,1,0,0,0]. But in this code, the action is encoded [0.4,0.4,0.4,0.4,0.4],which is action/action_space_size.

I'm confuse about this place,am i misunderstanding? please tell me which one is right, and why write like this, thanks.

Add an example

action_one_hot = (
torch.ones(
(
encoded_state.shape[0],
1,
encoded_state.shape[2],
encoded_state.shape[3],
)
)
.to(action.device)
.float()
)
action_one_hot = (
action[:, :, None, None] * action_one_hot / self.action_space_size
)

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions