This page illustrate how I collect and process the datasts.
I haved found many hand-related datasets on the internet, like FreiHAND, GANerated Hands Dataset, RHD, MHP, NYU Hand Pose Dataset, EgoHands, CMU Panoptic Dataset, etc. However, most of them do not provide suitable or satisfactory enough (in my opinion) annotations for my trainning case. So I aggregate some of them and add some data recorded and annotated by myself. Finally, I establish two reasonably big and good-annotated datasets for GestureDet model training: hand detection dataset and hand pose estimation dataset.
Note:
annotated by my own
here is a semi-auto process, which means:
- I use the detections generated by pretrained yolov5 as my general object anotation;
- I use the hand bbox and hand pose generated by pretrained mediapipe hand as my hand bbox and pose anotation.
This process is kind of like knowledge distilation process where the pretrained yolov5 and Mediapipe Hand serve as (very awesome) teachers and relieve the annotation burden by a large margin.
The hand detection dataset is aggregated from EgoHands and my self recorded data. It contains 9170
train samples and 93
validate samples. The annotations are arranged into COCO format.
For EgoHands dataset:
cd <path_to_EgoHands>/utils
# generate [hand] bbox from .mat annotation
python gen_hand_bboxes.py
# integrate the [general object bbox] detected by yolo and [hand bbox] generated above
python integrate_label_with_yolo.py
For self recorded data:
cd <path_to_self_recorded_data>/utils
# extract frames from videos
python extract_frames.py
# generate hand [bbox and pose] using pretrained mediapipe
python gen_hand_label_by_mediapipe.py
# integrate the [general object bbox] detected by yolo and [hand bbox] generated above
python integrate_label_with_yolo.py
Then, by
cd <path_to_DatasetAggregator>
python hand_det_dataset_aggregator.py
The source images and annotations of the two datasets are aggregated as COCO format and saved in train_od.json
and val_od.json
.
The hand pose estimation dataset is aggregated from GANerated Hands Dataset, MHP, and my self-recorded data. It contains 111749
train images and 1128
validate images. Each image has at leat 1 hand on it and each hand has 21 joints.
Note: the annotation style of MHP is different from common, and I map this style to the common one.
For GANerated Hands Dataset:
Note: GANerated dataset refers to the Real Hands Data in this page. Although there is no annotaions provided, the images are in high quality and I use mediapipe again to generate hand pose estimation as annotation for my training.
cd <path_to_GANerated>/utils
# generate hand pose from raw rgb image with pretrained mediapipe
python gen_anno_by_mediapipe.py
For MHP Dataset:
cd <path_to_MHP_Dataset>/utils
# generate 2d hand joints from annotion
python generate2Dpoints.py
For self recorded data, the pose annotations have been generated in the hand detection dataset process step.
Then, by
cd <path_to_DatasetAggregator>
python hand_pose_dataset_aggregator.py
The source images and annotations of the three datasets are aggregated and saved in train_pose.json
and val_pose.json
.