This repository is a collection of Knowledge Distillation (KD) methods implemented by the Huawei Montreal NLP team.
Included Projects
- MATE-KD
- KD for model compression and study of use of adversarial training to improve student accuracy using just the logits of the teacher as in standard KD.
- MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation
- Combined-KD
- Proposition of Combined-KD (ComKD) that takes advantage of data-augmentation and progressive training.
- How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding
- Minimax-kNN
- A sample-efficient semi-supervised kNN data augmentation technique.
- Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax
- Glitter
- A universal sample-efficient framework for incorporating augmented data into training.
- When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation
This project's license is under the Apache 2.0 license.