Create predictions only for non-labeled images #51

olgaliak · 2019-02-06T06:46:58Z

When we're doing Active Learning "cycle" model will be used to get predictions for bbox locations. Currently it's done even for images that's been already reviewed by human experts.
This change excludes those reviewed ("tagged") images.

Unit test has been added to test the change.
run_all_test.py also is passing.

yashpande · 2019-02-07T07:41:08Z

Hi Olga! Long time no see 😄 The only suggestion I would have here is that in train/create_predictions.py, line 158, you are doing all_images = np.concatenate((all_images,all_images_this_folder), axis=0). The np.concatenate allocates a new array with memory equal to the sum of the arrays it is concatenating, then makes a copy of all the data in both existing arrays into the new array. This makes concatenating one folder at a time a very time-and-memory-intensive task. Two ways of improving this would be:

Instead of get_images_for_prediction returning an np array, it can simply return a list of filenames. Then, this list of filenames can keep being appended to, and once get_images_for_prediction has been run on every folder, you can simply allocate one np array with enough space for all the images from all the folders, then populate that np array. I would suggest this option, since it requires the least additional memory (keep in mind that np.concatente essentially requires twice as much memory because you need space for the existing arrays and the new ones. This sounds super stupid, but the whole reason I am creating np.zeros arrays in the first place instead of creating a list and then converting it into an np array is because I was facing out of memory errors when making a list first (even on the 128GB DSVM).
Create a list of np arrays that you append to when you iterate through the folders i.e. all_images = [] then all_images.append(get_images_for_prediction(..)). Then at the end of the iteration you can do all_images = np.concatenate(all_images). This still takes twice as much memory, but it at least saves a lot of time because you only do the concatenation once instead of once for each folder.

Let me know if you have any questions/comments!

yashpande · 2019-02-07T07:42:28Z

train/create_predictions.py

+            if (all_images == None):
+                all_images = all_images_this_folder
+            else:
+                all_images = np.concatenate((all_images,all_images_this_folder), axis=0)


I would suggest either getting rid of the np.concatenate or doing it at the end instead of at each folder.

olgaliak added 4 commits February 5, 2019 17:19

create predictions (only for non-tagged images)

5b9bc64

create pedictions dix

3fd2bbe

adding unit test

3436fb6

remove print

57f4c31

olgaliak requested a review from yashpande February 6, 2019 06:46

yashpande requested changes Feb 7, 2019

View reviewed changes

olgaliak requested a review from gadoming February 12, 2019 05:42

gadoming approved these changes Feb 14, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create predictions only for non-labeled images #51

Create predictions only for non-labeled images #51

olgaliak commented Feb 6, 2019

yashpande commented Feb 7, 2019

yashpande Feb 7, 2019

Create predictions only for non-labeled images #51

Are you sure you want to change the base?

Create predictions only for non-labeled images #51

Conversation

olgaliak commented Feb 6, 2019

yashpande commented Feb 7, 2019

yashpande Feb 7, 2019

Choose a reason for hiding this comment