Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

user_folder=True download_vott_json only pulls from one folder #27

Open
abfleishman opened this issue Sep 25, 2018 · 15 comments
Open

user_folder=True download_vott_json only pulls from one folder #27

abfleishman opened this issue Sep 25, 2018 · 15 comments
Labels
needs testing Workaround is provided, needs verification

Comments

@abfleishman
Copy link

abfleishman commented Sep 25, 2018

I am trying the workflow with user_folders=True for the first time and it looks like after I initialized project it only is pulling images from one of the folders when I run download_vott_json.py. Ping me when you are ready to address this and I can provide info that you need to recreate.

@yashpande
Copy link
Collaborator

Hey Abram I'm actually flying back to San Diego right now but one quick thing you could try is setting user_folder=True. The program checks for that exact spelling/capitalization, so that might have caused an issue.

@abfleishman
Copy link
Author

abfleishman commented Sep 25, 2018

Yes I slipped up when writing the issues. I edited the issue to reflect the actual spelling etc I used user_folders=True It is pulling from the folder called 'random' but not from the one called 'infusions'
image
image

@yashpande
Copy link
Collaborator

This is likely because it is less certain about images in the random folder than it is about ones in infusion. There idea behind user_folders was that if it picked images from only one folder each time it would be easier for the tagger to tag. If it keeps picking random that's probably just because the "randomness" in random means that there's always photos it's uncertain about.

@abfleishman
Copy link
Author

I have not trained a model yet though. I want to label images in both folders (maybe one at a time) but I have only just initialized the project and I pulled 5 batches of 50 photos and they all came from the random folder. Now that I have trained a version of the model I am still only pulling photos from the random folder and I have set pick_max both true and false. thoughts? seems like a bug to me, at least if I understand the user_folders param

@yashpande
Copy link
Collaborator

Yeah that definitely seems like an issue. Again assuming pick_max was set to True (capital T), it should have picked photos from infusions. I'll look into it later today.

@abfleishman
Copy link
Author

yes pick_max=True

@yashpande
Copy link
Collaborator

Hey Abram,

I looked at the code and nothing sticks out as the reason. Could you send me the totag_{timestamp}.csv generated after the model was trained?

@abfleishman
Copy link
Author

Yup! here it is. I just picked a version of the files since I have many. So I downloaded images and tagged them ~20 times yesterday and it did eventually pick a small batch from the infusions folder (just once) and the rest of the time it picked from random.
totag_1537917161.zip

@yashpande
Copy link
Collaborator

Hey Abram,

The CSV file you sent me doesn't seem to have any predictions in it. In that case, the defaulting to random folder is, ironically, random. If possible, could you send a version of the file that has predictions on it (i.e. not all NULL)?

Thanks!

@abfleishman
Copy link
Author

ooops
totag_1537923981118.zip

@yashpande
Copy link
Collaborator

Hey Abram,

I ran the script on the local CSV files and the behaviour seems expected. If you do pick_max=True and run it a couple of times you should definitely get a few sets from the infusions folder.

@abfleishman
Copy link
Author

We have started to get images from infusions, but it is less common and the first ~20 times i did not. It would be nice to be able to force it to pick a specific folder, or sample from multiple folders at once as an option

@yashpande
Copy link
Collaborator

That sounds like a good feature to have. It's a little complicated to implement - a good way to do it would be to (in download_vott_json.py) look at how many images from each folder are in the tagged.csv file, then bias towards picking images from the folder with less images currently. I'll leave this one to @olgaliak since the code will have to be significantly changed.

@olgaliak
Copy link
Owner

I was looking at the code in dowload_vott_json and decided for now give each folder "equal chances 634aed8
It will result in bigger number of images pulled to user machine -- but from all folders. Let's see how useful it will be.

@olgaliak olgaliak added the needs testing Workaround is provided, needs verification label Oct 23, 2018
@abfleishman
Copy link
Author

@olgaliak I think the equal chances idea mentioned above makes sense, except I have a use case where it does not. I have just initialized a project with 70 user folders. Each folder has images that were tagged by a client as having presence/absence of a different type of animal. I asked for 20 images but it pulled 20 images from each folder!
It would be nice to have the config, allow the user to pull images from a specific folder or list of folders to avoid getting thousands of images at a time. My idea would be to allow the user to focus on specific classes (or sites or subsets of photos that they have organized themselves) to start with.
Maybe this should be a new Issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs testing Workaround is provided, needs verification
Projects
None yet
Development

No branches or pull requests

3 participants