Hessian AWare Quantization

HAWQ is an advanced quantization library written for PyTorch. HAWQ enables low-precision and mixed-precision uniform quantization, with direct hardware implementation through TVM.

Installation

  • PyTorch version >= 1.4.0
  • Python version >= 3.6
  • For training new models, you'll also need NVIDIA GPUs and NCCL
  • To install HAWQ and develop locally:
git clone https://github.com/Zhen-Dong/HAWQ.git
cd HAWQ
pip install -r requirements.txt

Getting Started

Quantization-Aware Training

An example to run uniform 8-bit quantization for resnet50 on ImageNet.

export CUDA_VISIBLE_DEVICES=0
python quant_train.py -a resnet50 --epochs 1 --lr 0.0001 --batch-size 128 --data /path/to/imagenet/ --pretrained --save-path /path/to/checkpoints/ --act-range-momentum=0.99 --wd 1e-4 --data-percentage 0.0001 --fix-BN --checkpoint-iter -1 --quant-scheme uniform8

The commands for other quantization schemes and for other networks are shown in the model zoo.

Inference Acceleration

Experimental Results

Table I and Table II in HAWQ-V3: Dyadic Neural Network Quantization

ResNet18 on ImageNet

Model Quantization Model Size(MB) BOPS(G) Accuracy(%) Inference Speed (batch=8, ms) Download
ResNet18 Floating Points 44.6 1858 71.47 9.7 (1.0x) resnet18_baseline
ResNet18 W8A8 11.1 116 71.56 3.3 (3.0x) resnet18_uniform8
ResNet18 Mixed Precision 6.7 72 70.22 2.7 (3.6x) resnet18_bops0.5
ResNet18 W4A4 5.8 34 68.45 2.2 (4.4x) resnet18_uniform4

ResNet50 on ImageNet

Model Quantization Model Size(MB) BOPS(G) Accuracy(%) Inference Speed (batch=8, ms) Download
ResNet50 Floating Points 97.8 3951 77.72 26.2 (1.0x) resnet50_baseline
ResNet50 W8A8 24.5 247 77.58 8.5 (3.1x) resnet50_uniform8
ResNet50 Mixed Precision 18.7 154 75.39 6.9 (3.8x) resnet50_bops0.5
ResNet50 W4A4 13.1 67 74.24 5.8 (4.5x) resnet50_uniform4

More results for different quantization schemes and different models (also the corresponding commands and important notes) are available in the model zoo.
To download the quantized models through wget, please refer to a simple command in model zoo.
Checkpoints in model zoo are saved in floating point precision. To shrink the memory size, BitPack can be applied on weight_integer tensors, or directly on quantized_checkpoint.pth.tar file.

GitHub

https://github.com/Zhen-Dong/HAWQ