-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
user_folder=True download_vott_json only pulls from one folder #27
Comments
Hey Abram I'm actually flying back to San Diego right now but one quick thing you could try is setting user_folder=True. The program checks for that exact spelling/capitalization, so that might have caused an issue. |
This is likely because it is less certain about images in the random folder than it is about ones in infusion. There idea behind user_folders was that if it picked images from only one folder each time it would be easier for the tagger to tag. If it keeps picking random that's probably just because the "randomness" in random means that there's always photos it's uncertain about. |
I have not trained a model yet though. I want to label images in both folders (maybe one at a time) but I have only just initialized the project and I pulled 5 batches of 50 photos and they all came from the random folder. Now that I have trained a version of the model I am still only pulling photos from the random folder and I have set pick_max both true and false. thoughts? seems like a bug to me, at least if I understand the user_folders param |
Yeah that definitely seems like an issue. Again assuming pick_max was set to True (capital T), it should have picked photos from infusions. I'll look into it later today. |
yes pick_max=True |
Hey Abram, I looked at the code and nothing sticks out as the reason. Could you send me the totag_{timestamp}.csv generated after the model was trained? |
Yup! here it is. I just picked a version of the files since I have many. So I downloaded images and tagged them ~20 times yesterday and it did eventually pick a small batch from the infusions folder (just once) and the rest of the time it picked from random. |
Hey Abram, The CSV file you sent me doesn't seem to have any predictions in it. In that case, the defaulting to random folder is, ironically, random. If possible, could you send a version of the file that has predictions on it (i.e. not all NULL)? Thanks! |
ooops |
Hey Abram, I ran the script on the local CSV files and the behaviour seems expected. If you do pick_max=True and run it a couple of times you should definitely get a few sets from the infusions folder. |
We have started to get images from infusions, but it is less common and the first ~20 times i did not. It would be nice to be able to force it to pick a specific folder, or sample from multiple folders at once as an option |
That sounds like a good feature to have. It's a little complicated to implement - a good way to do it would be to (in download_vott_json.py) look at how many images from each folder are in the tagged.csv file, then bias towards picking images from the folder with less images currently. I'll leave this one to @olgaliak since the code will have to be significantly changed. |
I was looking at the code in dowload_vott_json and decided for now give each folder "equal chances 634aed8 |
@olgaliak I think the equal chances idea mentioned above makes sense, except I have a use case where it does not. I have just initialized a project with 70 user folders. Each folder has images that were tagged by a client as having presence/absence of a different type of animal. I asked for 20 images but it pulled 20 images from each folder! |
I am trying the workflow with
user_folders=True
for the first time and it looks like after I initialized project it only is pulling images from one of the folders when I run download_vott_json.py. Ping me when you are ready to address this and I can provide info that you need to recreate.The text was updated successfully, but these errors were encountered: