Sign language with hand gesture recognition using Long Short Term Memory network (LSTM) with MediaPipe Hand tracking on desktop (CPU)
This code is built upon rabBit64's
Thank Google's MediaPipe team for great framework
- Using video input files instead of Webcam to train with video data
- Get hand landmark features for every frame per second (fps) per one video input per one word and make it into one txt file
- Installing and building MediaPipe examples
You can see it details in here
- Modify MediaPipe
After downloading original repo, you must look for 3 files in that with the same name in "modified_mediapipe" and replace them.
Make trainvideosset for each sign language word in one folder. Run build.py file to get txt file and mp4 output videos with hand tracking. You must have at least 150 videos per one word (one sign) to train
python3 build.py --input_data_path=[INPUT_PATH] --output_data_path=[OUTPUT_PATH]
For example: input_data_path=/.../trainvideosset/ and output_data_path=/.../traintxtset/
trainvideosset
|-- Cachly
| |-- 01_05_01.mp4
| |-- 01_05_02.mp4
| |-- 01_05_03.mp4
| ...
| |-- 01_05_20.mp4
|
|-- Camon
| |-- 01_09_01.mp4
| |-- 01_09_02.mp4
| |-- 01_09_03.mp4
| ...
| |-- 01_09_20.mp4
|...
The output path is initially an empty directory, and when the build is complete, mp4 and txt folders are extracted to your folder path
Created folder example:
traintxtset
|-- Absolute
| |-- Cachly
| |-- 01_05_01.txt
| |-- 01_05_02.txt
| |-- 01_05_03.txt
| ...
| |-- 01_05_20.txt
| ...
||-- Relative
| |-- Cachly
| |-- 01_05_01.txt
| |-- 01_05_02.txt
| |-- 01_05_03.txt
| ...
| |-- 01_05_20.txt
| ...
||-- _Cachly
| |-- 01_05_01.mp4
| |-- 01_05_02.mp4
| |-- 01_05_03.mp4
| ...
| |-- 01_05_20.mp4
|...
Important: Name the folder carefully as the folder name will be the label itself for the video data. (DO NOT use space bar or '_' to your folder name, ex train_videos_set or train videos set)
Open model.ipynb file on your Jupyter Notebook enviroment to train LSTM model. The model is saved as model.h5 in the current directory.
Run
python3 main.py
Result is displayed in a GUI with Tkinter library
- Predict a sign
- Predict a sequence (this project just recognize a sequence with 2 or 3 continuous signs)
Watch testing video for detail.