A Software assistant module that helps visually impaired users understand their surroundings by converting camera input into audio descriptions.
Weave AI magic with Groq
To build a Realtime Assistive technology that guides blind people in their day to day travel.
Quantumania
- This application captures video from the user's device camera, sends frames to a backend server for object detection using YOLOv8s ML model (default), enhances the descriptions using a LLM (llama3-70b-8192), and sends back response descriptions which will be converted to audio for the user.
It can be integrated into various Web apps (or) IoT so that it guides the user with what's in front of them whether that's a threat or general obstacles like so. (currently we present the web-app for demonstration purpose with 'basic utility')
---
- Frontend: React (for visual demo at present) but front end is not mandatory
- Backend: FastAPI
- Object detection ML-model: YOLOv8s
- APIs: Groq's for LLM integration
- Hosting: netlify (for frontend)
- β Groq: We used Groq for tailoring the responses for the user as fast as possible for the set of detected objects in the video scene
- β Modular Architecture
- β Visual recognition scene by multiple objects with respect to their timings in the Video Frame
- β User friendly
- β Responsive
- Demo Video Link: [https://youtu.be/ss8takCc2xk?si=mAEUUjNx7uWKhPXz]
Q: How will a blind person use this this ?
Currently for demonstration we've added the frontend web interface, but we can separately integrate the functionality to custom hardware projects to make the application auto run or turn on using voice commands for Real-life usage
Q: How is Groq's API used in the application ?
We've used open source LLM (Llama) through Groq's API's to tailor custom responses quickly that help the user navigate with respect to the objects in the field of view
Q: Does it provide responses only in English ?
As per the available opensource models, English responses are given correctly and we'd try to improve and integrate more languages from other FOSS foundations to bring diversity and inclusivity
Q: What model is used for Object detection and what is the data source ?
We've used Ultralytics' opensource YOLOv8s model with default COCO dataset who provide some of the industry's best ML models in computer vision
Q: I have other Question where can I ask it ?
We are open to resolve your queries and eager to collaborate. You can open an issue here or mail us at: link
- β All members of the team completed the mandatory task - Followed at least 2 of our social channels and filled the form (Details in Participant Manual)
- β All members of the team completed Bonus Task 1 - Sharing of Badges and filled the form (2 points) (Details in Participant Manual)
- β All members of the team completed Bonus Task 2 - Signing up for Sprint.dev and filled the form (3 points) (Details in Participant Manual)
- Python 3.11+
- uv latest Rust based project management tool for python "install for your platform from here" (optional)
- Node.js 20+
- Get your Groq API key (https://console.groq.com/) .env file in backend
- Download a suitable Yolo model or use it from the Repo we provided (we've used 'YOLOv8s') (https://github.com/ultralytics/ultralytics)
open two separate terminals into same "whatsthat" folder:,
(Make sure uv is installed by checking :)
uv --version #to check if uv is installed properly- Clone the repository:
git clone https://github.com/Pramod-325/whatsthat.git cd whatsthat
-
Open backend folder
cd backend # run this in 1st Terminal
-
Install dependencies:
uv add -r requirements.txt #in backend terminal (if uv is installed) (or) pip install -r requirements.txt #if uv is not installed
-
Create a
.envfile with your Groq API key: "in backend Terminal"GROQ_API_KEY=your_groq_api_key_hereAnd Place the downloaded yolo models in "yolo_models directory" or use the given one
-
Start the backend server:
uv run main.py #if uv is installed (or) python main.py
-
Navigate to the frontend directory from 'whatsthat' folder:
cd frontend -
Install dependencies:
npm install
-
Start the development server:
npm run dev
Then Navigate to path in URL or click below link:
-
Open your browser to http://localhost:5173/ (Make sure WebCam is present)
- Grant camera access when prompted
- Click "Start Vision Assistant" to begin processing
- The application will detect objects and provide audio descriptions
- Click "Stop Vision Assistant" to end the session
Provide any backend/frontend split or environment setup notes here.
- π Improved YOLO models with custom data training
- π‘οΈ Security enhancements like integrating everything locally
- π Adding more LLM integration for native languages for worldwide users
- GROQ's LPU's APIs for fast LLM response
- Open source libraries or tools: ReactJS, FastAPI, YOLOv8s COCO dataset for object detection
- Youtube Video References by:
It's our first online hackathon and a completely a new experience which we enjoyed a lot, there were challenges like working of project in one's computer and not others π. We learnt how to properly collab online to complete a project using Github's core functionality and the attempts we made to deploy the application and will forever be a memorable one beacause of the way Namespace-community has planned & executed it, so a huge shoutout also goes to them πππ







