Skip to content

Pramod-325/whatsthat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

69 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

github-submission-banner

πŸš€ WhatsThat -(Vision-to-Audio Assistant)

A Software assistant module that helps visually impaired users understand their surroundings by converting camera input into audio descriptions.


πŸ“Œ Problem Statement -1

Weave AI magic with Groq


🎯 Objective

To build a Realtime Assistive technology that guides blind people in their day to day travel.


🧠 Team & Approach

Team Name:

Quantumania

Team Members:

Your Approach:

  • This application captures video from the user's device camera, sends frames to a backend server for object detection using YOLOv8s ML model (default), enhances the descriptions using a LLM (llama3-70b-8192), and sends back response descriptions which will be converted to audio for the user.

It can be integrated into various Web apps (or) IoT so that it guides the user with what's in front of them whether that's a threat or general obstacles like so. (currently we present the web-app for demonstration purpose with 'basic utility')


System Architecture


---

Architectural Diagram


πŸ› οΈ Tech Stack

Core Technologies Used:

  • Frontend: React (for visual demo at present) but front end is not mandatory
  • Backend: FastAPI
  • Object detection ML-model: YOLOv8s
  • APIs: Groq's for LLM integration
  • Hosting: netlify (for frontend)

Sponsor Technologies Used (if any):

  • βœ… Groq: We used Groq for tailoring the responses for the user as fast as possible for the set of detected objects in the video scene

✨ Key Features

  • βœ… Modular Architecture
  • βœ… Visual recognition scene by multiple objects with respect to their timings in the Video Frame
  • βœ… User friendly
  • βœ… Responsive

Output Screenshots:

image-1 image-2 image-3 image-4 image-5 image-6

---

πŸ“½οΈ Demo & Deliverables


FAQs ❔

Q: How will a blind person use this this ?

Currently for demonstration we've added the frontend web interface, but we can separately integrate the functionality to custom hardware projects to make the application auto run or turn on using voice commands for Real-life usage

Q: How is Groq's API used in the application ?

We've used open source LLM (Llama) through Groq's API's to tailor custom responses quickly that help the user navigate with respect to the objects in the field of view

Q: Does it provide responses only in English ?

As per the available opensource models, English responses are given correctly and we'd try to improve and integrate more languages from other FOSS foundations to bring diversity and inclusivity

Q: What model is used for Object detection and what is the data source ?

We've used Ultralytics' opensource YOLOv8s model with default COCO dataset who provide some of the industry's best ML models in computer vision

Q: I have other Question where can I ask it ?

We are open to resolve your queries and eager to collaborate. You can open an issue here or mail us at: link


βœ… Tasks & Bonus Checklist

  • βœ… All members of the team completed the mandatory task - Followed at least 2 of our social channels and filled the form (Details in Participant Manual)
  • βœ… All members of the team completed Bonus Task 1 - Sharing of Badges and filled the form (2 points) (Details in Participant Manual)
  • βœ… All members of the team completed Bonus Task 2 - Signing up for Sprint.dev and filled the form (3 points) (Details in Participant Manual)

πŸ§ͺ How to Run the Project

Requirements:

Local Setup:

open two separate terminals into same "whatsthat" folder:,
(Make sure uv is installed by checking :)

uv --version   #to check if uv is installed properly
  1. Clone the repository:
    git clone https://github.com/Pramod-325/whatsthat.git
    cd whatsthat

Backend Setup (in Terminal 1)

  1. Open backend folder

    cd backend        # run this in 1st Terminal
  2. Install dependencies:

     uv add -r requirements.txt          #in backend terminal (if uv is installed)
                     (or)
     pip install -r requirements.txt     #if uv is not installed
  3. Create a .env file with your Groq API key: "in backend Terminal"

    GROQ_API_KEY=your_groq_api_key_here
    

    And Place the downloaded yolo models in "yolo_models directory" or use the given one

  4. Start the backend server:

     uv run main.py           #if uv is installed
             (or)
     python main.py

Frontend Setup (in Terminal-2)

  1. Navigate to the frontend directory from 'whatsthat' folder:

    cd frontend
  2. Install dependencies:

    npm install
  3. Start the development server:

    npm run dev

    Then Navigate to path in URL or click below link:

  4. Open your browser to http://localhost:5173/ (Make sure WebCam is present)

Using the Application

  1. Grant camera access when prompted
  2. Click "Start Vision Assistant" to begin processing
  3. The application will detect objects and provide audio descriptions
  4. Click "Stop Vision Assistant" to end the session

Provide any backend/frontend split or environment setup notes here.


🧬 Future Scope

  • πŸ“ˆ Improved YOLO models with custom data training
  • πŸ›‘οΈ Security enhancements like integrating everything locally
  • 🌐 Adding more LLM integration for native languages for worldwide users

πŸ“Ž Resources / Credits


🏁 Final Words

It's our first online hackathon and a completely a new experience which we enjoyed a lot, there were challenges like working of project in one's computer and not others πŸ˜‚. We learnt how to properly collab online to complete a project using Github's core functionality and the attempts we made to deploy the application and will forever be a memorable one beacause of the way Namespace-community has planned & executed it, so a huge shoutout also goes to them πŸŽŠπŸŽ‰πŸŽ‰


About

A simplified application for visually impaired. https://whatsthat-2504.netlify.app

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors