Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampled evaluation games #15

Open
TimYuenior opened this issue Jul 19, 2018 · 3 comments
Open

Sampled evaluation games #15

TimYuenior opened this issue Jul 19, 2018 · 3 comments

Comments

@TimYuenior
Copy link

In the original paper, only positions from self-play games are sampled. These have temperature=1 for part of the games, meaning more exploration. Won't adding all evaluation games to the games to sample from heavily decrease exploration? Of course we could remove the recording of evaluation games if we parallelize everything, but since it saves time doing this, I was wondering if you know if it has any noticeable negative impact.

@Narsil
Copy link
Owner

Narsil commented Jul 19, 2018

To be honest, I added evaluation games to training in order to use more data available.

It will decrease exploration but running this algorithm on a single machine with a singe GPU, it is kind of hard to replicate results anyway. I tried focusing on 9x9 which should be more easy but to be honest it does not achieve good performance even on this.

I suspect a bug somewhere more important than just this exploration problem but I might be wrong.
I haven't had time to focus on this lately, but my intuition was that it would be better to recheck the basic steps of the algorithm first.

Hope that helps.

@TimYuenior
Copy link
Author

Yeah, it's hard to tell what actually is affecting the performance without 64 GPUs available.
Anyways, thanks for the response!

@TimYuenior
Copy link
Author

TimYuenior commented Jul 20, 2018

As a side note. Appending the state samples and the target labels into 3 separate big tensors and giving it do model.fit once (with the number of epochs) is a fairly big speedup over calling model.fit for every sample state.

Edit: Ignore what I said, the tensors will be incomplete like this or you'll retrain on previous samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants