-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampled evaluation games #15
Comments
To be honest, I added evaluation games to training in order to use more data available. It will decrease exploration but running this algorithm on a single machine with a singe GPU, it is kind of hard to replicate results anyway. I tried focusing on 9x9 which should be more easy but to be honest it does not achieve good performance even on this. I suspect a bug somewhere more important than just this exploration problem but I might be wrong. Hope that helps. |
Yeah, it's hard to tell what actually is affecting the performance without 64 GPUs available. |
As a side note. Appending the state samples and the target labels into 3 separate big tensors and giving it do model.fit once (with the number of epochs) is a fairly big speedup over calling model.fit for every sample state. Edit: Ignore what I said, the tensors will be incomplete like this or you'll retrain on previous samples. |
In the original paper, only positions from self-play games are sampled. These have temperature=1 for part of the games, meaning more exploration. Won't adding all evaluation games to the games to sample from heavily decrease exploration? Of course we could remove the recording of evaluation games if we parallelize everything, but since it saves time doing this, I was wondering if you know if it has any noticeable negative impact.
The text was updated successfully, but these errors were encountered: