sampling in continuous/complex action spaces with 'density prior' is not working #200
Open
1 task done
Labels
bug
Something isn't working
Search before asking
🐛 Describe the bug
In Learning and Planning in Complex Action Spaces (Hubert et al.), there are basically two changes compared to MuZero:
In the code, I think I see a difference:
Add an example
The error message I get:
File "/home/user_231/muzero-general/self_play.py", line 401, in ucb_score
child.prior / sum([child.prior for child in parent.children.values()])
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
Last test reward: 0.00. Training step: 0/3. Played games: 0. Loss: 0.00
This is because no one assigns node.prior in the continuous branch.
I think it has to be set by the parent in his expand method and to be equal to each child's CDF, at the sampled point.
Also, regarding the K, I think we need to make a small change in the expand method, and sample more than one action:
action_value = distribution.sample(K).squeeze(0).detach().cpu().numpy()
self.children[Action(action_value)] = Node()
Environment
No response
Minimal Reproducible Example
python muzero.py mujoco_IP {"node_prior":"density"}
Additional
No response
The text was updated successfully, but these errors were encountered: