Export predictions for each example #28

jtfields · 2022-03-09T17:10:46Z

I have successfully run Google's BigBird NLP on the IMDB dataset and also a custom dataset imported using tfds. BigBird's imdb.ipynb only prints the overall accuracy and loss. I'm trying to export the predictions for each record in the dataset and have been unable to find any information on how to do this. Any help is appreciated!

Here is the current code that I used for the summary metrics:
eval_loss = tf.keras.metrics.Mean(name='eval_loss')
eval_accuracy = tf.keras.metrics.CategoricalAccuracy(name='eval_accuracy')

opt = tf.keras.optimizers.Adam(FLAGS.learning_rate)
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.CategoricalAccuracy(name='train_accuracy')

for i, ex in enumerate(tqdm(dataset.take(FLAGS.num_train_steps), position=0)):
loss, log_probs, grads = fwd_bwd(ex[0], ex[1])
opt.apply_gradients(zip(grads, model.trainable_weights+headl.trainable_weights))
train_loss(loss)
train_accuracy(tf.one_hot(ex[1], 2), log_probs)
if i% 200 == 0:
print('Loss = {} Accuracy = {}'.format(train_loss.result().numpy(), train_accuracy.result().numpy()))

jtfields · 2022-03-25T15:18:53Z

I found flags in the runclassifier.py for do_export and do_eval which will export the models and provide additional metrics. However, I'm struggling with how to enable these flags. I tried passing FLAGS.do_export from my bigbird script based on the imdb.ipynb sample notebook. Has anyone else successfully enabled do_export or do_eval?

jtfields · 2022-04-06T21:03:05Z

I found this on Stack Overflow which describes why I'm unable to save the labels and predictions from BigBird - "As you correctly noted, you don't have labels in predict (because that's for inference, i.e., you use that to classify new data). The problem is that the evaluate call won't return the labels, because it runs a loop over all your dataset and computes aggregated metrics, which are then returned. If you want to have for each batch both the prediction and the labels, you'll have to load the model from the checkpoint, make a tf.Session() and loop sess.run([predictions, labels]) calls until your data is exausted."

jtfields · 2022-04-06T21:07:13Z

I am now focused on run_classifier.py as a better solution to export labels and predictions. I'm struggling with how to adapt the StackOverflow approach for the how to save the labels and predictions using the code below. Any suggestions for how to add this to run_classifier.py?

Rebuild the input pipeline

input_fn = create_eval_input_fn(path=eval_files)
features, labels = input_fn()

Rebuild the model

predictions = model_fn(features, labels, tf.estimator.ModeKeys.EVAL).predictions

Manually load the latest checkpoint

saver = tf.train.Saver()
with tf.Session() as sess:
ckpt = tf.train.get_checkpoint_state('/my/directory')
saver.restore(sess, ckpt.model_checkpoint_path)

# Loop through the batches and store predictions and labels
prediction_values = []
label_values = []
while True:
    try:
        preds, lbls = sess.run([predictions, labels])
        prediction_values += preds
        label_values += lbls
    except tf.errors.OutOfRangeError:
        break
# store prediction_values and label_values somewhere

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export predictions for each example #28

Export predictions for each example #28

jtfields commented Mar 9, 2022

jtfields commented Mar 25, 2022

jtfields commented Apr 6, 2022

jtfields commented Apr 6, 2022

Export predictions for each example #28

Export predictions for each example #28

Comments

jtfields commented Mar 9, 2022

jtfields commented Mar 25, 2022

jtfields commented Apr 6, 2022

jtfields commented Apr 6, 2022

Rebuild the input pipeline

Rebuild the model

Manually load the latest checkpoint