Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export predictions for each example #28

Open
jtfields opened this issue Mar 9, 2022 · 3 comments
Open

Export predictions for each example #28

jtfields opened this issue Mar 9, 2022 · 3 comments

Comments

@jtfields
Copy link

jtfields commented Mar 9, 2022

I have successfully run Google's BigBird NLP on the IMDB dataset and also a custom dataset imported using tfds. BigBird's imdb.ipynb only prints the overall accuracy and loss. I'm trying to export the predictions for each record in the dataset and have been unable to find any information on how to do this. Any help is appreciated!

Here is the current code that I used for the summary metrics:
eval_loss = tf.keras.metrics.Mean(name='eval_loss')
eval_accuracy = tf.keras.metrics.CategoricalAccuracy(name='eval_accuracy')

opt = tf.keras.optimizers.Adam(FLAGS.learning_rate)
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.CategoricalAccuracy(name='train_accuracy')

for i, ex in enumerate(tqdm(dataset.take(FLAGS.num_train_steps), position=0)):
loss, log_probs, grads = fwd_bwd(ex[0], ex[1])
opt.apply_gradients(zip(grads, model.trainable_weights+headl.trainable_weights))
train_loss(loss)
train_accuracy(tf.one_hot(ex[1], 2), log_probs)
if i% 200 == 0:
print('Loss = {} Accuracy = {}'.format(train_loss.result().numpy(), train_accuracy.result().numpy()))

@jtfields
Copy link
Author

I found flags in the runclassifier.py for do_export and do_eval which will export the models and provide additional metrics. However, I'm struggling with how to enable these flags. I tried passing FLAGS.do_export from my bigbird script based on the imdb.ipynb sample notebook. Has anyone else successfully enabled do_export or do_eval?

@jtfields
Copy link
Author

jtfields commented Apr 6, 2022

I found this on Stack Overflow which describes why I'm unable to save the labels and predictions from BigBird - "As you correctly noted, you don't have labels in predict (because that's for inference, i.e., you use that to classify new data). The problem is that the evaluate call won't return the labels, because it runs a loop over all your dataset and computes aggregated metrics, which are then returned. If you want to have for each batch both the prediction and the labels, you'll have to load the model from the checkpoint, make a tf.Session() and loop sess.run([predictions, labels]) calls until your data is exausted."

@jtfields
Copy link
Author

jtfields commented Apr 6, 2022

I am now focused on run_classifier.py as a better solution to export labels and predictions. I'm struggling with how to adapt the StackOverflow approach for the how to save the labels and predictions using the code below. Any suggestions for how to add this to run_classifier.py?

Rebuild the input pipeline

input_fn = create_eval_input_fn(path=eval_files)
features, labels = input_fn()

Rebuild the model

predictions = model_fn(features, labels, tf.estimator.ModeKeys.EVAL).predictions

Manually load the latest checkpoint

saver = tf.train.Saver()
with tf.Session() as sess:
ckpt = tf.train.get_checkpoint_state('/my/directory')
saver.restore(sess, ckpt.model_checkpoint_path)

# Loop through the batches and store predictions and labels
prediction_values = []
label_values = []
while True:
    try:
        preds, lbls = sess.run([predictions, labels])
        prediction_values += preds
        label_values += lbls
    except tf.errors.OutOfRangeError:
        break
# store prediction_values and label_values somewhere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant