Distiller
A large scale study of Knowledge Distillation. NYU Computer Vision Project
Python Dependencies
This codebase only supports Python 3.6+.
Required Python packages:
torch torchvision tqdm numpy pandas seaborn
All packages can be installed using pip3 install --user -r requirements.txt
.
This project is also integerated with Pytorch Lightning. Use the lightning branch to see Pytorch Lightning compatible code.
Run
The benchmarks can be run via python3 evaluate_kd.py
and providing the
respective command line parameters. For example:
python3 evaluate_kd.py --epochs 200 --teacher resnet18 --student resnet8 --dataset cifar10 --teacher-checkpoint pretrained/resnet18_cifar10_95260_parallel.pth --mode nokd kd
Runs basic student training and knowledge distillation for 200 epochs using a
pretrained teacher. There are checkpoints of multiple models in the pretrained folder.
Supported distillation modes
NOKD
Plain training with no knowledge distillation.
KD
Hinton loss to distill a student network.
ALLKD
Distill from a list of teacher models and pick the best performing one.
KDPARAM
Distill using varying combinations of temperature and alpha and pick the best performing combination.
TRIPLET
Knowledge Distillation with a triplet loss using the student as negative example.
MULTIKD
Train a student under an ensemble of students that are picked from a list.
UDA
Run knowledge distillation in combination with unsupervised data augmentation.
TAKD
Run distillation using Teacher-Assistant distillation.
AB
Run feature distillation using Activation-Boundary distillation.
OH
Run feature distillation using the Feature Overhaul distillation.
RKD
Run distillation using the Relational Knowledge distillation.
PKD
Run feature distillation using the Patient Knowledge distillation.
SFD
Runs a custom feature distillation distillation (Simple Feature Distillation) that just pools and flattens feature layers.