Skip to content
Max Kuhmichel edited this page Sep 21, 2023 · 16 revisions

Collection of research paper

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
arxiv.org 2019 Link

TinyBERT
knowledge destillation of BERT arxiv.org 2019 Link

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
arxiv.org 2021 Link
TORSTEN HOEFLER, et al.

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
arxiv.org 2020 Link
Google AI Brain Team, Zihang Dai, et al.

Training Compute-Optimal Large Language Models
Chinchilla, training with smaller models and more data
arxiv.org 2020 Link
DeepMind, Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, et al.

RoBERTa: A Robustly Optimized BERT Pretraining Approach
An essay about the importance of pretraining when compressing models
arxiv.org 2019 Link
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, et al.

Quadapter: Adapter for GPT-2 Quantization
its hard to quantize GPT-2 and similar decoder based models. Ideas for preventing overfitting for finetuning
arxiv.org 2022 Link
Qualcomm AI Research, Minseop Park, Jaeseong You, Markus Nagel, Simyung Chang

Understanding the Difficulty of Training Transformers
aclanthology.org Link
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 5747–5763, November 16–20, 2020
Microsoft Research, Liyuan Liu, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Jiawei Han

Compression of Generative Pre-trained Language Models via Quantization
arxiv.org 2022 Link
Chaofan Tao, Lu Hou, Wei Zhang, Lifeng Shang, Xin Jiang, Qun Liu, Ping Luo, Ngai Wong

Train Large, Then Compress:
Rethinking Model Size for Efficient Training and Inference of Transformers
arxiv.org 2020 Link
Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, et al.

Hardware Acceleration of Fully Quantized BERT for Efficient Natural Language Processing
arxiv.org 2021 Link Zejian Liu, Gang Li, Jian Cheng

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Alternative to Mask Language Modeling(MLM)
arxiv.org 2020 Link
Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Mnning

The case for 4-bit precision: k-bit Inference Scaling Laws
arxiv.org 2022 Link

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
arxiv.org 2022 Link

QLora
arxiv.org 2023 Link

I-BERT: Integer-only BERT Quantization
Quantization technics for layernorm softmax and gelu
arxiv.org 2021 Link

Clone this wiki locally