This project demonstrates real-time detection of sign language numbers (0-9) using hand gestures. The detection system utilizes a webcam to capture hand gestures, employs MediaPipe for hand pose estimation, and then predicts the corresponding number using a trained machine learning model (Random Forest). The model can accurately predict the numbers based on the hand landmarks' positions.
Sign Language Numbers (0-9)
––––––––––––––––––––––––––––––––––––––––––––
Demo Video
Make sure you have the following dependencies installed:
- Mediapipe:
0.10.14
- OpenCV (cv2):
4.10.0
- Scikit-learn:
1.5.2
You can install them via pip:
pip install mediapipe==0.10.14 opencv-python==4.10.0 scikit-learn==1.5.2
I collected the dataset using my webcam, generating 1000 images for each class (numbers 0-9) using the collect_images
module. Then, I processed these images with the create_dataset
module to extract x and y coordinates from hand estimation landmarks. Each landmark array was labeled with the respective number and saved as a pickle file.
One of the main challenges in creating the dataset was ensuring accurate predictions for different hand orientations, distances from the webcam, and positions in the frame. I handled this challenge by capturing diverse images in various conditions, but the model's accuracy can still improve with more frames at different distances and positions.
The complete dataset is around 6GB, so it's not uploaded here, but you can collect your own dataset using the aforementioned modules.
The machine learning model used is a Random Forest classifier, which achieved over 99% accuracy. The model was trained using the train_classifier
module on the extracted coordinates from hand landmarks, and the trained model is saved as a .p
file for future predictions.
To use the system, run the main.ipynb
script. The script captures video from your webcam, uses MediaPipe for hand pose estimation, and passes the extracted landmarks' coordinates to the saved Random Forest model. The model predicts the number based on the hand gestures, displaying the result on a box around the hand.
The system predicts numbers in real time, showing a number from 0-9 based on your hand gesture and position in each frame.
This project is licensed under the MIT License. Feel free to use it in your projects or contribute to improve it.