Update tfprocess.py for TensorFlow 2.4+ #57

CallOn84 · 2023-07-17T22:51:08Z

While trying to train my own model, I happen to found an error within tfprocess.py that made train_maia.py not work if you're using a TensorFlow version that is greater than 2.4.

The reason is that tf.keras.mixed_precision.experimental API had been removed with the introduction of tf.keras.mixed_precision in TensorFlow 2.4+.

To fix this, I changed two lines of code.

tf.keras.mixed_precision.experimental.set_policy('mixed_float16'), which can be found in Line 123, was changed to tf.keras.mixed_precision.set_global_policy('mixed_float16').

self.optimizer = tf.keras.mixed_precision.experimental.LossScaleOptimizer(self.optimizer, self.loss_scale), which can be found in Line 150, was changed to self.optimizer = tf.keras.mixed_precision.LossScaleOptimizer(self.optimizer).

While trying to train my own model, I happen to found an error within tfprocess.py that made train_maia.py not work if you're using a TensorFlow version that is greater than 2.4.0. The reason is that tf.keras.mixed_precision.experimental API had been removed with the introduction of tf.keras.mixed_precision in TensorFlow 2.4+. To fix this, I changed two lines of code. tf.keras.mixed_precision.experimental.set_policy('mixed_float16'), which can be found in Line 123, was changed to tf.keras.mixed_precision.set_global_policy('mixed_float16'). self.optimizer = tf.keras.mixed_precision.experimental.LossScaleOptimizer(self.optimizer, self.loss_scale), which can be found in Line 150, was changed to self.optimizer = tf.keras.mixed_precision.LossScaleOptimizer(self.optimizer).

reidmcy · 2023-07-18T01:22:45Z

Does this change maintain compatibility with Tensorflow 2.1.0? This codebase is meant for replicating our work which was done with the environment given in maia_env.yml

CallOn84 · 2023-07-18T10:59:22Z

Does this change maintain compatibility with Tensorflow 2.1.0? This codebase is meant for replicating our work which was done with the environment given in maia_env.yml

The tf.keras.mixed_precision code in this version of tfprocess.py wouldn't work with TensorFlow>2.4 because, as I understand it, keras.mixed_precision became more stable and improved from the original tf.keras.mixed_precision.experimental that Google decided to remove tf.keras.mixed_precision.experimental.

I would suggest creating some conditional statement that would allow tfprocess.py to identify what version of TensorFlow the user is using and run accordingly.

Something like this:

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_visible_devices(gpus[self.cfg['gpu']], 'GPU')
tf.config.experimental.set_memory_growth(gpus[self.cfg['gpu']], True)
if self.model_dtype == tf.float16:
	if tf.__version__ >= '2.4':
		tf.keras.mixed_precision.set_global_policy('mixed_float16')
	else:
		tf.keras.mixed_precision.experimental.set_policy('mixed_float16')

self.active_lr = 0.01
self.optimizer = tf.keras.optimizers.SGD(learning_rate=lambda: self.active_lr, momentum=0.9, nesterov=True)
self.orig_optimizer = self.optimizer
if self.loss_scale != 1:
	if tf.__version__ >= '2.4':
		self.optimizer = tf.keras.mixed_precision.LossScaleOptimizer(self.optimizer)
	else:
		self.optimizer = tf.keras.mixed_precision.experimental.LossScaleOptimizer(self.optimizer, self.loss_scale)

There's probably a better way of coding this, but this is the only thing my small brain can come up with.

reidmcy · 2023-07-18T19:13:51Z

Adding that code is saying the 2.4 code is equivalent to 2.1. But, as you said keras.mixed_precision changed after 2.4. We would need to test the new implementation by rerunning the full training and testing. This is a research project, we need evidence for changes.

CallOn84 · 2023-07-18T19:25:37Z

Adding that code is saying the 2.4 code is equivalent to 2.1. But, as you said keras.mixed_precision changed after 2.4. We would need to test the new implementation by rerunning the full training and testing. This is a research project, we need evidence for changes.

I'm in the middle of training a Maia model that's targeting a rating of around 2500. Once I finish, I can send you the model for testing.

reidmcy · 2023-07-18T19:37:11Z

That's not a replication of our paper, this code is for the the KDD 2020 paper.

CallOn84 · 2023-07-18T19:45:15Z

That's not a replication of our paper, this code is for the the KDD 2020 paper.

Can you clarify what you mean by the code being for the KDD 2020 paper? What would paper are you referring to?

reidmcy · 2023-07-18T19:49:24Z

Aligning Superhuman AI with Human Behavior: Chess as a Model System is the name of the paper

CallOn84 · 2023-07-18T19:50:36Z

Aligning Superhuman AI with Human Behavior: Chess as a Model System is the name of the paper

Ah, right. I got confused when you said it's not a replication of your paper and referred to something else.

Currently, the switch to tf.keras.mixed_precision.set_global_policy('mixed_float16') and self.optimizer = tf.keras.mixed_precision.LossScaleOptimizer(self.optimizer) allows me to run train_maia.py using TensorFlow 2.10. I can provide you the TensorBoard logs alongside the Maia net when the training is fully complete.

ezhang7423 · 2023-12-11T20:45:39Z

Are there any updates on this?

CallOn84 · 2024-02-05T21:38:31Z

Are there any updates on this?

In order for me to see if this works, in terms of the end product, I'm currently training a Maia 2200 net. From there, testing it against Maia 1900 would allow me to see whether or not the changes to Keras have any positive or negative effects towards Maia 2200.

reidmcy · 2024-02-06T22:15:01Z

@CallOn84 We have a new model that solves some of the training problems (and the old libraries), but not the data efficiency or expanding the Elo range much. So if you can wait a bit we should have more usable code to release.

CallOn84 · 2024-02-06T22:20:04Z

@CallOn84 We have a new model that solves some of the training problems (and the old libraries), but not the data efficiency or expanding the Elo range much. So if you can wait a bit we should have more usable code to release.

Cool, will do.

CallOn84 mentioned this pull request Mar 2, 2024

No output when training script is ran #62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update tfprocess.py for TensorFlow 2.4+ #57

Update tfprocess.py for TensorFlow 2.4+ #57

CallOn84 commented Jul 17, 2023 •

edited

Loading

reidmcy commented Jul 18, 2023

CallOn84 commented Jul 18, 2023 •

edited

Loading

reidmcy commented Jul 18, 2023

CallOn84 commented Jul 18, 2023

reidmcy commented Jul 18, 2023

CallOn84 commented Jul 18, 2023

reidmcy commented Jul 18, 2023

CallOn84 commented Jul 18, 2023 •

edited

Loading

ezhang7423 commented Dec 11, 2023

CallOn84 commented Feb 5, 2024

reidmcy commented Feb 6, 2024

CallOn84 commented Feb 6, 2024

Update tfprocess.py for TensorFlow 2.4+ #57

Are you sure you want to change the base?

Update tfprocess.py for TensorFlow 2.4+ #57

Conversation

CallOn84 commented Jul 17, 2023 • edited Loading

reidmcy commented Jul 18, 2023

CallOn84 commented Jul 18, 2023 • edited Loading

reidmcy commented Jul 18, 2023

CallOn84 commented Jul 18, 2023

reidmcy commented Jul 18, 2023

CallOn84 commented Jul 18, 2023

reidmcy commented Jul 18, 2023

CallOn84 commented Jul 18, 2023 • edited Loading

ezhang7423 commented Dec 11, 2023

CallOn84 commented Feb 5, 2024

reidmcy commented Feb 6, 2024

CallOn84 commented Feb 6, 2024

CallOn84 commented Jul 17, 2023 •

edited

Loading

CallOn84 commented Jul 18, 2023 •

edited

Loading

CallOn84 commented Jul 18, 2023 •

edited

Loading