-
Notifications
You must be signed in to change notification settings - Fork 0
Papers
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
arxiv.org 2019 Link
TinyBERT
knowledge destillation of BERT
arxiv.org 2019 Link
Sparsity in Deep Learning: Pruning and growth for efficient
inference and training in neural networks
arxiv.org 2021 Link
TORSTEN HOEFLER, et al.
Funnel-Transformer: Filtering out Sequential
Redundancy for Efficient Language Processing
arxiv.org 2020 Link
Google AI Brain Team, Zihang Dai, et al.
Training Compute-Optimal Large Language Models
Chinchilla, training with smaller models and more data
arxiv.org 2020 Link
DeepMind, Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, et al.
RoBERTa: A Robustly Optimized BERT Pretraining Approach
An essay about the importance of pretraining when compressing models
arxiv.org 2019 Link
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, et al.
Quadapter: Adapter for GPT-2 Quantization
its hard to quantize GPT-2 and similar decoder based models. Ideas for preventing overfitting for finetuning
arxiv.org 2022 Link
Qualcomm AI Research, Minseop Park, Jaeseong You, Markus Nagel, Simyung Chang
Understanding the Difficulty of Training Transformers
aclanthology.org Link
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 5747–5763,
November 16–20, 2020
Microsoft Research, Liyuan Liu, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Jiawei Han
Compression of Generative Pre-trained Language Models via Quantization
arxiv.org 2022 Link
Chaofan Tao, Lu Hou, Wei Zhang, Lifeng Shang, Xin Jiang, Qun Liu, Ping Luo, Ngai Wong
Train Large, Then Compress:
Rethinking Model Size for Efficient Training and Inference of Transformers
arxiv.org 2020 Link
Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, et al.
Hardware Acceleration of Fully Quantized BERT for
Efficient Natural Language Processing
arxiv.org 2021 Link
Zejian Liu, Gang Li, Jian Cheng
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Alternative to Mask Language Modeling(MLM)
arxiv.org 2020 Link
Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Mnning
The case for 4-bit precision: k-bit Inference Scaling Laws
arxiv.org 2022 Link
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
arxiv.org 2022 Link
QLora
arxiv.org 2023 Link
I-BERT: Integer-only BERT Quantization
Quantization technics for layernorm softmax and gelu
arxiv.org 2021 Link