A Chrome extension that detects ASL (American Sign Language) fingerspelling via webcam, converts it to text, and uses text-to-speech to speak the words aloud. This enables deaf/HoH individuals who use ASL to communicate verbally in video calls, online meetings, and anywhere on the web.
- Real-time ASL Detection: Detects fingerspelling letters A-Z using webcam
- Word Accumulation: Automatically builds words from detected letters
- Text-to-Speech: Converts words to speech using Groq API
- Video Conferencing Support: Works on Google Meet, Zoom, Teams, Webex, and Discord
- Manual Control: "Speak Now" button for immediate speech
- Customizable Settings: Adjust pause detection, voice selection, and more
-
Groq API Key: Pre-configured - no setup required!
- The extension comes with a pre-configured API key
- No need to sign up or configure anything
-
Chrome Browser: Version 88 or later (for Manifest V3 support)
-
Clone or Download this repository
-
Add Extension Icons:
- Navigate to
assets/icons/ - Add three icon files:
icon16.png(16x16 pixels)icon48.png(48x48 pixels)icon128.png(128x128 pixels)
- You can create simple icons using any image editor
- Navigate to
-
Load Extension in Chrome:
- Open Chrome and navigate to
chrome://extensions/ - Enable "Developer mode" (toggle in top right)
- Click "Load unpacked"
- Select the
signspeak-extensionfolder
- Open Chrome and navigate to
-
Ready to Use:
- The extension is pre-configured with an API key
- No setup required - just activate and start using!
- Optionally configure settings (pause detection, voice, etc.)
-
Navigate to a supported site:
- Google Meet:
https://meet.google.com - Zoom:
https://zoom.us - Microsoft Teams:
https://teams.microsoft.com - Webex:
https://webex.com - Discord:
https://discord.com
- Google Meet:
-
Activate the extension:
- Click the SignSpeak icon in Chrome toolbar
- Click "Activate" button
- Grant camera permissions when prompted
-
Start fingerspelling:
- Position your hand in front of the webcam
- Fingerspell letters one at a time
- The extension will detect letters and build words
- After a pause (default 1.5 seconds), the word will be spoken aloud
-
Manual control:
- Click "Speak Now" to immediately speak the current word
- No need to wait for pause detection
- Groq API Key: Pre-configured (no input required)
- Pause Detection: Time in milliseconds before auto-speaking (300-3000ms)
- Voice Selection: Choose from available Groq voices
- Auto-speak on pause: Toggle automatic speech on pause detection
- Manifest V3: Uses Chrome Extension Manifest V3
- Content Scripts: Injected into video conferencing sites
- Background Service Worker: Manages state and coordinates components
- ASL Detection: Uses basic image processing for hand detection (self-contained, no external dependencies)
- TTS: Groq API for high-quality text-to-speech
Note: The current ASL detection uses a basic image processing approach. For production use with higher accuracy, consider integrating a trained machine learning model.
signspeak-extension/
├── manifest.json # Extension manifest
├── popup/ # Extension popup UI
│ ├── popup.html
│ ├── popup.js
│ └── popup.css
├── content/ # Content scripts
│ └── content.js
├── background/ # Background service worker
│ └── service-worker.js
├── src/ # Core modules
│ ├── config.js # Configuration
│ ├── asl-detector.js # ASL detection logic
│ ├── word-accumulator.js # Word building logic
│ └── tts-service.js # TTS integration
├── assets/
│ └── icons/ # Extension icons
└── README.md
The extension is self-contained and does not require external JavaScript libraries:
- No CDN dependencies: All code runs locally
- Basic image processing: Uses built-in browser APIs for hand detection
- Groq API: Only external dependency is the TTS API (requires internet connection)
Note: The current implementation uses basic image processing for ASL detection. For production use with higher accuracy, integrate a trained machine learning model (can be bundled locally).
The current implementation uses basic image processing. For better accuracy:
-
Train a machine learning model:
- Collect ASL fingerspelling data (images or hand landmarks)
- Train a model (TensorFlow.js, ONNX, etc.) to classify into letters
- Bundle the model with the extension
- Update
src/asl-detector.jsto use your trained model
-
Use MediaPipe Hands (requires bundling):
- Bundle MediaPipe Hands library locally
- Use it for accurate hand landmark detection
- Combine with a trained classification model
-
Use a pre-trained model:
- Look for open-source ASL detection models
- Bundle with the extension
- Integrate into the detection pipeline
To add support for additional video conferencing sites:
- Add the site URL to
host_permissionsinmanifest.json - Add the site pattern to
content_scripts.matchesinmanifest.json - Add the site domain to
SUPPORTED_SITESinsrc/config.js
- Ensure camera permissions are granted
- Check that no other application is using the camera
- Try refreshing the page after granting permissions
- Ensure good lighting
- Position hand clearly in front of camera
- Check that MediaPipe Hands is loading (check browser console)
- Try adjusting confidence threshold in code if needed
- Verify your Groq API key is correct
- Check browser console for API errors
- Ensure you have API credits/quota available
- Check internet connection
- Ensure you're on a supported site
- Check that content script is loading (check browser console)
- Try reloading the extension in
chrome://extensions/
- Camera Access: Only used locally for hand detection
- API Key: Stored securely in Chrome's local storage
- Data Processing: Hand detection happens locally; only text is sent to Groq API
- No Data Collection: The extension does not collect or store personal data
- Detection Accuracy: Current rule-based classifier has limited accuracy. A trained model is recommended for production use.
- Internet Required: TTS requires internet connection for Groq API
- Single Hand: Currently optimized for single-hand fingerspelling
- Browser Support: Chrome/Chromium only (Manifest V3)
Contributions are welcome! Areas for improvement:
- Better ASL detection models
- Support for more video conferencing platforms
- Additional TTS providers
- Performance optimizations
- UI/UX improvements
[Specify your license here]
For issues, questions, or contributions, please [create an issue or contact information].
- MediaPipe for hand landmark detection
- Groq for TTS API
- TensorFlow.js community
- ASL community for inspiration