Potential problems #16

TimYuenior · 2018-08-03T04:40:26Z

Edit: turned it into a general thread instead

The AGZ spreadsheet mentions only one filter for the value head. In this implementation, two filters are used. Any reason to it? I don't think it's going to have a big impact, but I'm just putting it out there.
The target policies that are created during simulated games are taken from the prior probabilities p. These are calculated by the neural net. From the AGZ cheatsheet I believe that the target policies should instead be the search probabilities, which are given by the number of visits of a move and the temperature parameter.

Some notes:

During MCTS search, there are lots of zero Q-values and often patches of Q-values that are almost 1 appear. (This might just be due to a bad network)
The MCTS batched search yields more Q-values, but the search depth will be considerably lowered. Chosen moves are only at max depth 4 from the current position and usually 2 or 3. Running 64 simulations with batch size 1 can give chosen moves with up to depth 66 from the current position, but of course, it will be slower. Unsure on what is a good balance. Hard to tune.

Narsil · 2018-08-13T19:31:22Z

What do you mean 2 filters ? The value head is one conv, with batch normalization and one dense layer:

Edit: I reread your comment and understood you where talking about the number of filters in the convolutionnal layer, that IS an error, I changed that to see if it improved things (I expect it actually might !)

Not sure about that. I imagine the cheat sheet you are referencing is this:
https://medium.com/applied-data-science/alphago-zero-explained-in-one-diagram-365f5abf67e0
If you would use number of visits and temperature, because temperature is most of the time 0, you policy target would be most of the time all 0 but one 1. Not sure it is desirable. I tried looking at the paper again, but this point is not perfectly clear to me.
The best segment I could find is this:

Evaluator To ensure we always generate the best quality data, we evaluate each new neural
network checkpoint against the current best network f θ ∗ before using it for data generation. The
neural network f θ i is evaluated by the performance of an MCTS search α θ i that uses f θ i to evalu-
ate leaf positions and prior probabilities (see Search Algorithm).

Not sure I understand the problem here. The Q-value is probably wrong yes, but the question is why.
What do you mean patches of 1 ? The Q-value of moves close to each other are the same ? It does not seem too wrong for a naive player (the details are not yet seen)
I used MCTS_BATCH_SIZE=8 personnally. It is the same mentionned in the paper.

Positions in the queue are evaluated by the neural network using a mini-batch size of 8; the
search thread is locked until evaluation completes. The leaf node is expanded and each edge (s L , a)
is initialised to {N (s L , a) = 0, W (s L , a) = 0, Q(s L , a) = 0, P (s L , a) = p a }; the value v is then
backed up.

I am not sure why they use threaded search to be honest instead of sequential because, it is a deterministic search, and everything uses the same tree. You do win some performance by using multiple cores I guess, but you also might end up trying to expand the same node multiple times (especially when temperature is zero). I expected that could be a source of problems (because in this implementation I always expand different nodes in the tree search), but I don't think expanding too greedily would be a problem in the long run.

TimYuenior changed the title ~~Slight discrepancy in model~~ Potential problems Aug 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential problems #16

Potential problems #16

TimYuenior commented Aug 3, 2018 •

edited

Loading

Narsil commented Aug 13, 2018 •

edited

Loading

Potential problems #16

Potential problems #16

Comments

TimYuenior commented Aug 3, 2018 • edited Loading

Narsil commented Aug 13, 2018 • edited Loading

TimYuenior commented Aug 3, 2018 •

edited

Loading

Narsil commented Aug 13, 2018 •

edited

Loading