-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential problems #16
Comments
Edit: I reread your comment and understood you where talking about the number of filters in the convolutionnal layer, that IS an error, I changed that to see if it improved things (I expect it actually might !)
I am not sure why they use threaded search to be honest instead of sequential because, it is a deterministic search, and everything uses the same tree. You do win some performance by using multiple cores I guess, but you also might end up trying to expand the same node multiple times (especially when temperature is zero). I expected that could be a source of problems (because in this implementation I always expand different nodes in the tree search), but I don't think expanding too greedily would be a problem in the long run. |
Edit: turned it into a general thread instead
The AGZ spreadsheet mentions only one filter for the value head. In this implementation, two filters are used. Any reason to it? I don't think it's going to have a big impact, but I'm just putting it out there.
The target policies that are created during simulated games are taken from the prior probabilities p. These are calculated by the neural net. From the AGZ cheatsheet I believe that the target policies should instead be the search probabilities, which are given by the number of visits of a move and the temperature parameter.
Some notes:
During MCTS search, there are lots of zero Q-values and often patches of Q-values that are almost 1 appear. (This might just be due to a bad network)
The MCTS batched search yields more Q-values, but the search depth will be considerably lowered. Chosen moves are only at max depth 4 from the current position and usually 2 or 3. Running 64 simulations with batch size 1 can give chosen moves with up to depth 66 from the current position, but of course, it will be slower. Unsure on what is a good balance. Hard to tune.
The text was updated successfully, but these errors were encountered: