This repo contains the multi-modal data preparation code for Skeleton Aware Multi-modal Sign Language Recognition (SAM-SLR).
List of all six modalities:
- Full-body pose keypoints
- Full-body pose features
- RGB frames
- RGB optical flow
- HHA (depth)
- Depth flow
Use pretrained model of whole-body pose estimation to extract 133 landmarks from rgb videos and save as npy files.
-
Go to wholepose folder, change input_path and output_npy variables as the path of input videos and output npy files.
-
Download pretrained whole-body pose model: Google Drive
-
Run
python demo.py
-
Copy generated npy files to corresponding data folders.
Use the feature/wholepose_features_extraction.py to extract skeleton features.
Get frames from RGB videos and crop to 256x256 according to the whole-pose skeletons extracted above.
-
Change folder, npy_folder, out_folder variables accordingly in gen_frames.py.
-
Run
python gen_frames.py
There are two types of flow modality: color flow and depth flow. Those data can be obtained by pretrained Caffe model first. Then combine flow_x and flow_y and crop the combined flow data using gen_flow.py.
-
Obtain raw flow data from videos using docker as described in optical_flow_guidelines.docx
-
Change folder, npy_folder, out_folder variables accordingly in gen_flow.py.
-
Run
python gen_flow.py
Use matlab code in Depth2HHA_master_mat to extract HHA from depth videos. It takes a long time extracting HHA features. And then crop the hha images and maskout pixels using gen_hha.py.
-
Change input_folder and output_folder and hha_root variables accordingly in CVPR21Chal_convert_HHA.m and run the script.
-
Change folder, npy_folder, out_folder variables accordingly in gen_hha.py.
-
Run
python gen_hha.py