Skip to content

sutwangyan/MSKA

Repository files navigation

MSKA

PyTorch report

PWC PWC PWC PWC PWC

Introduction

We propose a multi-stream keypoint attention network to depict a sequence of keypoints produced by a readily available keypoint estimator. In order to facilitate interaction across multiple streams, we investigate diverse methodologies such as keypoint fusion strategies, head fusion, and self-distillation. The resulting framework is denoted as MSKA-SLR, which is expanded into a sign language translation (SLT) model through the straightforward addition of an extra translation network.We carry out comprehensive experiments on well-known benchmarks like Phoenix-2014, Phoenix-2014T, and CSL-Daily to showcase the efficacy of our methodology. Notably, we have attained a novel state-of-the-art performance in the sign language translation task of Phoenix-2014T.

Performance

MSKA-SLR

Dataset WER Model Training
Phoenix-2014 22.1 ckpt config
Phoenix-2014T 20.5 ckpt config
CSL-Daily 27.8 ckpt config

MSKA-SLT

Dataset R B1 B2 B3 B4 Model Training
Phoenix-2014T 53.54 54.79 42.42 34.49 29.03 ckpt config
CSL-Daily 54.04 56.37 42.80 32.78 25.52 ckpt config

Installation

conda create -n mska python==3.10.13
conda activate mska
# Please install PyTorch according to your CUDA version.
pip install -r requirements.txt

Download

Datasets

Download datasets from their websites and place them under the corresponding directories in data/

Pretrained Models

mbart_de / mbart_zh : pretrained language models used to initialize the translation network for German and Chinese, with weights from mbart-cc-25.

We provide pretrained models Phoenix-2014T and CSL-Daily. Download this directory and place them under pretrained_models.

Keypoints We provide human keypoints for three datasets, Phoenix-2014, Phoenix-2014T, and CSL-Daily, pre-extracted by HRNet. Please download them and place them under data/Phoenix-2014t(Phoenix-2014 or CSL-Daily).

MSKA-SLR Training

python train.py --config configs/${dataset}_s2g.yaml --epoch 100

MSKA-SLR Evaluation

python train.py --config configs/${dataset}_s2g.yaml --resume pretrained_models/${dataset}_SLR/best.pth --eval

MSKA-SLT Training

python train.py --config configs/${dataset}_s2t.yaml --epoch 40

MSKA-SLT Evaluation

python train.py --config configs/${dataset}_s2t.yaml --resume pretrained_models/${dataset}_SLT/best.pth --eval

Citations

@misc{guan2024multistream,
      title={Multi-Stream Keypoint Attention Network for Sign Language Recognition and Translation}, 
      author={Mo Guan and Yan Wang and Guangkun Ma and Jiarui Liu and Mingzu Sun},
      year={2024},
      eprint={2405.05672},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages