Skip to content

Latest commit

 

History

History
69 lines (50 loc) · 1.74 KB

README.md

File metadata and controls

69 lines (50 loc) · 1.74 KB

Zipformer

Implementation of the U-Net like Zipformer from the Zipformer paper, for improving the Conformer with better temporal resolution.

Usage

1 Zipformer Block

import torch
from zipformer import ZipformerBlock

block = ZipformerBlock(
    dim = 512,
    dim_head = 64,
    heads = 8,
    mult = 4
)

x = torch.randn(32, 100, 512) #[batch_size,num_time_steps,feature_dim]

block(x) # (32, 100, 512)

Zipformer - just multiple ZipformerBlock from above

import torch
from zipformer import Zipformer

zipformer = Zipformer(
    dim = 512,
    depth = 12,          # 12 blocks
    dim_head = 64,
    heads = 8,
    mult = 4,
)

x = torch.randn(32, 100, 512)

zipformer(x) # (32, 100, 512)

Todo

  • switch to a better relative positional encoding.
  • whitener and balancer activation modifications.
  • adding a training and evaluation script.

References

  1. lucidrains/conformer
  2. facebookresearch/ConvNext
  3. k2-fsa/icefall

Contribution

Please give a pull request,I will be happy to improve this naive implementation.

Citations

@misc{@article{yao2023zipformer,
  title={Zipformer: A faster and better encoder for automatic speech recognition},
  author={Yao, Zengwei and Guo, Liyong and Yang, Xiaoyu and Kang, Wei and Kuang, Fangjun and Yang, Yifan and Jin, Zengrui and Lin, Long and Povey, Daniel},
  journal={arXiv preprint arXiv:2310.11230},
  year={2023}
}