Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues #3

Open
kalashjindal opened this issue Mar 2, 2023 · 1 comment
Open

Issues #3

kalashjindal opened this issue Mar 2, 2023 · 1 comment

Comments

@kalashjindal
Copy link

Around 15k images are present in the data csv, but only about 10k images in total are used in the notebook.
The model was trained as a binary problem, but the real problem is a multi-calss one.
The only folder created in create dataset is dataset category, but how is dataset category test used in notebooks?
Receiving an accuracy of over 95% but not using other metrics to demonstrate it statistically is not a good thing.

@kalashjindal
Copy link
Author

Added multithreading for downloading the images much faster

import numpy as np
import pandas as pd
import requests
import os
import threading

dress_patterns_df = pd.read_csv('dress_patterns.csv')
dress_patterns = dress_patterns_df.values

category

category = set(dress_patterns_df['category'])
print(category)

#create a folder dataset and nested folder of category
print(os.listdir())
os.mkdir('dataset_category')

for cat in category:
print(cat)
os.mkdir('dataset_category/'+cat)

print(os.listdir('dataset_category'))

def download_image(url, category, unit_id, i):
try:
r = requests.get(url, allow_redirects=True)
open('dataset_category/'+category+'/'+str(unit_id)+'.jpg', 'wb').write(r.content)
except:
print('ERROR at: ', i)

save image in respective category folder.

threads = []
for i in range(len(dress_patterns)):
if i%5 == 0:
print(i, '/', len(dress_patterns))
pattern = dress_patterns[i]
url = pattern[3]
unit_id = pattern[0]
category = pattern[1]
thread = threading.Thread(target=download_image, args=(url, category, unit_id, i))
threads.append(thread)
thread.start()

# limit the number of threads to 5
if len(threads) == 5:
    for thread in threads:
        thread.join()
    threads = []

wait for any remaining threads to complete

for thread in threads:
thread.join()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant