This repository intergrated various Knowledge Distillation methods. This implementation is based on these repositories:
- PyTorch Cifar Models
- https://github.com/szagoruyko/attention-transfer
- https://github.com/lenscloth/RKD
- https://github.com/clovaai/overhaul-distillation
- https://github.com/HobbitLong/RepDistiller
- KD - Distilling the Knowledge in a Neural Network
- FN - FitNets: Hints for Thin Deep Nets
- NST - Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
- AT - Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
- RKD - Relational Knowledge Distillation
- SP - Similarity-Preserving Knowledge Distillation
- OD - A Comprehensive Overhaul of Feature Distillation
- CIFAR10, CIFAR100
- ResNet
- Python3
- PyTorch (> 1.0)
- torchvision (> 0.2)
- NumPy
- type : dataset type (cifar10, cifar100)
- model : network type (resnet, wideresnet)
- depth : depth for resnet and wideresnet (teacher or baseline), sdepth : same for student network
- wfactor : wide factor for wideresnet (teacher or baseline), swfactor : student network
- tn : index number of the multiple trainings (teacher or baseline), stn : same for student network
- distype : type of distillation method (KD, FN, NST, AT, RKD, SP, OD)
- ex) dataset : cifar100, model: resnet110, index of the number of trainings: 1
python3 ./train.py --type cifar100 --model resnet --depth 110 --tn 1
- Hyperparamters for each distillation method are fixed to same values on each original paper
- ex) dataset : cifar100, teacher network : resnet110, teacher index : 1, student network : resnet20, student index : 1, index of the number of distillations: 1
python3 ./distill.py --type cifar100 --teacher resnet --student resnet --depth 110 --tn 1 --sdepth 20 --stn 1 --distype KD