Front-End
- JavaScript, TypeScript
- React
- Antd
Back-End
-
AWS
- EC2 t3a.xlarge spot instance
- S3 for storage
-
Local GPU Machine (offline as of June 13th 2022)
- Ubuntu 18.04
- Ryzen 3, GTX 1050ti 4GB, 48GB RAM
-
Python
- Milvus (vector database)
- Elasticsearch (text search, mapping to milvus collections)
- Flask
- RQ
- Social Media APIs
- GPT3 API
Our image database is hosted on AWS S3 and we have scraped three social media sites for content.
For Twitter and Instagram we scraped over a list of hashtags (800+) aiming to get as diverse database as possible. For Reddit we scraped the top 1000 follwed subreddits.
We limited the scraper to SFW content for all three sites
Website | Total Size | Image Files |
---|---|---|
89G |
628120 | |
110G |
761481 | |
115G |
645616 |