-
Notifications
You must be signed in to change notification settings - Fork 659
Description
Search before asking
- I have searched the MuZero issues and found no similar bug report.
🐛 Describe the bug
I'm confuse about this one: In muzero paper, the input to the dynamics function is the hidden state concatenated with a representation of the action for the transition. The problem is the code is different from paper describe which i understand, a normal action (playing a stone on the board) is encoded as an all zero plane, with a single one in the position of the played stone. For example, if action_space_size = 5, action =2, an action could encoded [0,1,0,0,0]. But in this code, the action is encoded [0.4,0.4,0.4,0.4,0.4],which is action/action_space_size.
I'm confuse about this place,am i misunderstanding? please tell me which one is right, and why write like this, thanks.
Add an example
action_one_hot = (
torch.ones(
(
encoded_state.shape[0],
1,
encoded_state.shape[2],
encoded_state.shape[3],
)
)
.to(action.device)
.float()
)
action_one_hot = (
action[:, :, None, None] * action_one_hot / self.action_space_size
)
Environment
No response
Minimal Reproducible Example
No response
Additional
No response