This project is a UOttahack 8 hackathon submission designed to assist individuals with visual impairments. It uses a real-time camera feed to understand the environment around the user and provides spoken feedback and guidance.
Users interact with the system using voice commands triggered by a chosen keyword (currently “Jarvis”). The system can:
- Request help —> notify the volunteer application that a person is in need of help, sharing the location of the user.
- Describe the surroundings —> identify objects or obstacles in front of the user.
- Provide safe path guidance —> suggest the best route to avoid danger or obstacles.
- Real-time object recognition using camera input
- Voice activation with keyword detection
- Voice-based responses and guidance
- Help request feature to alert someone nearby or remotely
- Integrates with ElevenLabs and OpenRouter for speech and AI capabilities
- Python 3.10+
venvfor virtual environmentngrokfor exposing the backend API to the frontend- A webcam or camera device for real-time vision input
python -m venv venv
source venv/bin/activate # macOS / Linux
venv\Scripts\activate # Windowspip install -r requirements.txtuvicorn backend.api.api:app --host 0.0.0.0 --port 8000 --workers 1ngrok http 8000Copy the ngrok forwarding URL (e.g., https://somelink.ngrok.io).
ELEVENLABS_API_KEY=<your_elevenlabs_api_key>
OPENROUTER_API_KEY=<your_openrouter_api_key>
NGROK_URL=<your_ngrok_url>Replace the API base URL in script.js with the NGROK_URL you obtained from ngrok.
python -m backend.speech.listeninglive-server“Jarvis, help me”
“Jarvis, what’s in front of me?”
“Jarvis, guide me to a safe path”
In this case Jarvis is the key word, but this can be replaced in backend/speech/listen.py