Distiller

A large scale study of Knowledge Distillation. NYU Computer Vision Project

Python Dependencies

This codebase only supports Python 3.6+.

Required Python packages:

  • torch torchvision tqdm numpy pandas seaborn

All packages can be installed using pip3 install --user -r requirements.txt.

This project is also integerated with Pytorch Lightning. Use the lightning branch to see Pytorch Lightning compatible code.

Run

The benchmarks can be run via python3 evaluate_kd.py and providing the
respective command line parameters. For example:

python3 evaluate_kd.py --epochs 200 --teacher resnet18 --student resnet8 --dataset cifar10 --teacher-checkpoint pretrained/resnet18_cifar10_95260_parallel.pth --mode nokd kd

Runs basic student training and knowledge distillation for 200 epochs using a
pretrained teacher. There are checkpoints of multiple models in the pretrained folder.

Supported distillation modes

NOKD

Plain training with no knowledge distillation.

KD

Hinton loss to distill a student network.

ALLKD

Distill from a list of teacher models and pick the best performing one.

KDPARAM

Distill using varying combinations of temperature and alpha and pick the best performing combination.

TRIPLET

Knowledge Distillation with a triplet loss using the student as negative example.

MULTIKD

Train a student under an ensemble of students that are picked from a list.

UDA

Run knowledge distillation in combination with unsupervised data augmentation.

TAKD

Run distillation using Teacher-Assistant distillation.

AB

Run feature distillation using Activation-Boundary distillation.

OH

Run feature distillation using the Feature Overhaul distillation.

RKD

Run distillation using the Relational Knowledge distillation.

PKD

Run feature distillation using the Patient Knowledge distillation.

SFD

Runs a custom feature distillation distillation (Simple Feature Distillation) that just pools and flattens feature layers.

GitHub