Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential problems #16

Open
TimYuenior opened this issue Aug 3, 2018 · 1 comment
Open

Potential problems #16

TimYuenior opened this issue Aug 3, 2018 · 1 comment

Comments

@TimYuenior
Copy link

TimYuenior commented Aug 3, 2018

Edit: turned it into a general thread instead

  1. The AGZ spreadsheet mentions only one filter for the value head. In this implementation, two filters are used. Any reason to it? I don't think it's going to have a big impact, but I'm just putting it out there.

  2. The target policies that are created during simulated games are taken from the prior probabilities p. These are calculated by the neural net. From the AGZ cheatsheet I believe that the target policies should instead be the search probabilities, which are given by the number of visits of a move and the temperature parameter.

Some notes:

  1. During MCTS search, there are lots of zero Q-values and often patches of Q-values that are almost 1 appear. (This might just be due to a bad network)

  2. The MCTS batched search yields more Q-values, but the search depth will be considerably lowered. Chosen moves are only at max depth 4 from the current position and usually 2 or 3. Running 64 simulations with batch size 1 can give chosen moves with up to depth 66 from the current position, but of course, it will be slower. Unsure on what is a good balance. Hard to tune.

@TimYuenior TimYuenior changed the title Slight discrepancy in model Potential problems Aug 7, 2018
@Narsil
Copy link
Owner

Narsil commented Aug 13, 2018

  1. What do you mean 2 filters ? The value head is one conv, with batch normalization and one dense layer:

screenshot_2018-08-13 tensorboard

Edit: I reread your comment and understood you where talking about the number of filters in the convolutionnal layer, that IS an error, I changed that to see if it improved things (I expect it actually might !)

  1. Not sure about that. I imagine the cheat sheet you are referencing is this:
    https://medium.com/applied-data-science/alphago-zero-explained-in-one-diagram-365f5abf67e0
    If you would use number of visits and temperature, because temperature is most of the time 0, you policy target would be most of the time all 0 but one 1. Not sure it is desirable. I tried looking at the paper again, but this point is not perfectly clear to me.
    The best segment I could find is this:

Evaluator To ensure we always generate the best quality data, we evaluate each new neural
network checkpoint against the current best network f θ ∗ before using it for data generation. The
neural network f θ i is evaluated by the performance of an MCTS search α θ i that uses f θ i to evalu-
ate leaf positions and prior probabilities (see Search Algorithm).

  1. Not sure I understand the problem here. The Q-value is probably wrong yes, but the question is why.
    What do you mean patches of 1 ? The Q-value of moves close to each other are the same ? It does not seem too wrong for a naive player (the details are not yet seen)

  2. I used MCTS_BATCH_SIZE=8 personnally. It is the same mentionned in the paper.

Positions in the queue are evaluated by the neural network using a mini-batch size of 8; the
search thread is locked until evaluation completes. The leaf node is expanded and each edge (s L , a)
is initialised to {N (s L , a) = 0, W (s L , a) = 0, Q(s L , a) = 0, P (s L , a) = p a }; the value v is then
backed up.

I am not sure why they use threaded search to be honest instead of sequential because, it is a deterministic search, and everything uses the same tree. You do win some performance by using multiple cores I guess, but you also might end up trying to expand the same node multiple times (especially when temperature is zero). I expected that could be a source of problems (because in this implementation I always expand different nodes in the tree search), but I don't think expanding too greedily would be a problem in the long run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants