Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict on test for metrics and unlabeled images only #17

Open
abfleishman opened this issue Sep 7, 2018 · 2 comments
Open

predict on test for metrics and unlabeled images only #17

abfleishman opened this issue Sep 7, 2018 · 2 comments
Labels
question Further information is requested

Comments

@abfleishman
Copy link

When there are a lot of images in the blob storage training is fast but prediction is slow. It would be a nice option to only predict on unlabeled images to improve speed and be able to iterate faster and an option to only do metrics on the test set / not predict on the rest of the labeled images.

@olgaliak olgaliak added the question Further information is requested label Oct 23, 2018
@olgaliak
Copy link
Owner

Hi @abfleishman ! Could you please clarify the scenario a bit?

  • Predict on Test set "for metrics only"
    I assume the purpose is just to eval how good the current model is now, correct?
  • Predict on unlabeled images to improve speed
    I did not quite got how this would improve speed. I assume there are usually thousands of unlabeled images , right?

@abfleishman
Copy link
Author

As I understand it now, then a new model is trained is predicts on all of the images in the blob storage that it is pointed at. I have been starting with maybe 1000 images and training and prediction go very quickly. Then I have been adding more images, let's say another 1000, so there are 2000 images in blob storage. If we are only using the workflow for generating new training data, we do not need to predict for the first already labeled 1000 images since we do not need to review them again (hopefully) and we can save time by only predicting on the new 1000 images. this gets more pronounced when the numbers are larger, of course, let's say 10,000 images that have been labeled and 2000 new unlabeled images. Does that make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants