Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tfprocess.py for TensorFlow 2.4+ #57

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

CallOn84
Copy link

@CallOn84 CallOn84 commented Jul 17, 2023

While trying to train my own model, I happen to found an error within tfprocess.py that made train_maia.py not work if you're using a TensorFlow version that is greater than 2.4.

The reason is that tf.keras.mixed_precision.experimental API had been removed with the introduction of tf.keras.mixed_precision in TensorFlow 2.4+.

To fix this, I changed two lines of code.

tf.keras.mixed_precision.experimental.set_policy('mixed_float16'), which can be found in Line 123, was changed to tf.keras.mixed_precision.set_global_policy('mixed_float16').

self.optimizer = tf.keras.mixed_precision.experimental.LossScaleOptimizer(self.optimizer, self.loss_scale), which can be found in Line 150, was changed to self.optimizer = tf.keras.mixed_precision.LossScaleOptimizer(self.optimizer).

While trying to train my own model, I happen to found an error within tfprocess.py that made train_maia.py not work if you're using a TensorFlow version that is greater than 2.4.0. 

The reason is that tf.keras.mixed_precision.experimental API had been removed with the introduction of tf.keras.mixed_precision in TensorFlow 2.4+.

To fix this, I changed two lines of code. 

tf.keras.mixed_precision.experimental.set_policy('mixed_float16'), which can be found in Line 123, was changed to tf.keras.mixed_precision.set_global_policy('mixed_float16').

self.optimizer = tf.keras.mixed_precision.experimental.LossScaleOptimizer(self.optimizer, self.loss_scale), which can be found in Line 150, was changed to self.optimizer = tf.keras.mixed_precision.LossScaleOptimizer(self.optimizer).
@reidmcy
Copy link
Member

reidmcy commented Jul 18, 2023

Does this change maintain compatibility with Tensorflow 2.1.0? This codebase is meant for replicating our work which was done with the environment given in maia_env.yml

@CallOn84
Copy link
Author

CallOn84 commented Jul 18, 2023

Does this change maintain compatibility with Tensorflow 2.1.0? This codebase is meant for replicating our work which was done with the environment given in maia_env.yml

The tf.keras.mixed_precision code in this version of tfprocess.py wouldn't work with TensorFlow>2.4 because, as I understand it, keras.mixed_precision became more stable and improved from the original tf.keras.mixed_precision.experimental that Google decided to remove tf.keras.mixed_precision.experimental.

I would suggest creating some conditional statement that would allow tfprocess.py to identify what version of TensorFlow the user is using and run accordingly.

Something like this:

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_visible_devices(gpus[self.cfg['gpu']], 'GPU')
tf.config.experimental.set_memory_growth(gpus[self.cfg['gpu']], True)
if self.model_dtype == tf.float16:
	if tf.__version__ >= '2.4':
		tf.keras.mixed_precision.set_global_policy('mixed_float16')
	else:
		tf.keras.mixed_precision.experimental.set_policy('mixed_float16')
self.active_lr = 0.01
self.optimizer = tf.keras.optimizers.SGD(learning_rate=lambda: self.active_lr, momentum=0.9, nesterov=True)
self.orig_optimizer = self.optimizer
if self.loss_scale != 1:
	if tf.__version__ >= '2.4':
		self.optimizer = tf.keras.mixed_precision.LossScaleOptimizer(self.optimizer)
	else:
		self.optimizer = tf.keras.mixed_precision.experimental.LossScaleOptimizer(self.optimizer, self.loss_scale)

There's probably a better way of coding this, but this is the only thing my small brain can come up with.

@reidmcy
Copy link
Member

reidmcy commented Jul 18, 2023

Adding that code is saying the 2.4 code is equivalent to 2.1. But, as you said keras.mixed_precision changed after 2.4. We would need to test the new implementation by rerunning the full training and testing. This is a research project, we need evidence for changes.

@CallOn84
Copy link
Author

Adding that code is saying the 2.4 code is equivalent to 2.1. But, as you said keras.mixed_precision changed after 2.4. We would need to test the new implementation by rerunning the full training and testing. This is a research project, we need evidence for changes.

I'm in the middle of training a Maia model that's targeting a rating of around 2500. Once I finish, I can send you the model for testing.

@reidmcy
Copy link
Member

reidmcy commented Jul 18, 2023

That's not a replication of our paper, this code is for the the KDD 2020 paper.

@CallOn84
Copy link
Author

That's not a replication of our paper, this code is for the the KDD 2020 paper.

Can you clarify what you mean by the code being for the KDD 2020 paper? What would paper are you referring to?

@reidmcy
Copy link
Member

reidmcy commented Jul 18, 2023

@CallOn84
Copy link
Author

CallOn84 commented Jul 18, 2023

Aligning Superhuman AI with Human Behavior: Chess as a Model System is the name of the paper

Ah, right. I got confused when you said it's not a replication of your paper and referred to something else.

Currently, the switch to tf.keras.mixed_precision.set_global_policy('mixed_float16') and self.optimizer = tf.keras.mixed_precision.LossScaleOptimizer(self.optimizer) allows me to run train_maia.py using TensorFlow 2.10. I can provide you the TensorBoard logs alongside the Maia net when the training is fully complete.

@ezhang7423
Copy link

Are there any updates on this?

@CallOn84
Copy link
Author

CallOn84 commented Feb 5, 2024

Are there any updates on this?

In order for me to see if this works, in terms of the end product, I'm currently training a Maia 2200 net. From there, testing it against Maia 1900 would allow me to see whether or not the changes to Keras have any positive or negative effects towards Maia 2200.

@reidmcy
Copy link
Member

reidmcy commented Feb 6, 2024

@CallOn84 We have a new model that solves some of the training problems (and the old libraries), but not the data efficiency or expanding the Elo range much. So if you can wait a bit we should have more usable code to release.

@CallOn84
Copy link
Author

CallOn84 commented Feb 6, 2024

@CallOn84 We have a new model that solves some of the training problems (and the old libraries), but not the data efficiency or expanding the Elo range much. So if you can wait a bit we should have more usable code to release.

Cool, will do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants