MQBench
MQBench: Towards Reproducible and Deployable Model Quantization Benchmark
We propose a benchmark to evaluate different quantization algorithms on various settings. MQBench is a first attempt to evaluate, analyze, and benchmark the reproducibility and deployability for model quantization algorithms. We choose multiple different platforms for realworld deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive stateoftheart quantization algorithms under a unified training pipeline. MQBench acts like a bridge to connect the algorithm and the hardware. We conduct a comprehensive analysis and find considerable intuitive or counterintuitive insights.
Highlighted Features

Integrate with the latest tracing techniques in Pytorch 1.8.

Quantization Algorithms
 Learned Step Size Quantization: https://arxiv.org/abs/1902.08153
 Quantization Interval Learning: https://arxiv.org/abs/1808.05779
 Differentiable Soft Quantization: https://arxiv.org/abs/1908.05033
 Parameterized Clipping AcTivation: https://arxiv.org/abs/1805.06085
 Additive PowersofTwo Quantization: https://arxiv.org/abs/1909.13144
 DoReFaNet: https://arxiv.org/abs/1606.06160

Network Architectures:
 ResNet18, ResNet50: https://arxiv.org/abs/1512.03385
 MobileNetV2: https://arxiv.org/abs/1801.04381
 EfficienteNetLiteB0: https://blog.tensorflow.org/2020/03/higheraccuracyonvisionmodelswithefficientnetlite.html
 RegNetX600GF: https://arxiv.org/abs/2003.13678

Hardware Platform:
Library Haware Type s Form Granularity Symmetry Fold BN Academic None FP32 Pertensor Symmetric No TensorRT GPU FP32 Perchannel Symmetric Yes ACL ASIC FP32 Perchannel Asymmetric Yes TVM ARM CPU POT Pertensor Symmetric Yes SNPE DSP FP32 Pertensor Asymmetric Yes FBGEMM X86 CPU FP32 Perchannel Asymmetric Yes
Installation
These instructions will help get MQBench up.

Clone MQBench.

(Optionally) Create a Python virtual environment.

Install the MQBenchrequired packages
$ pip install r requirements.txt
Notes: MQBench uses Pytorch1.8, our quantized model is based on the new
torch.fx
tracing techniques. 
MQBench use the Pytorch distributed dataparallel training with
nccl
backend (see details here), please make sure your machine can initailize that distributed learning environment.
How to Reproduce MQBench
We provide the running scripts run.sh
and configuration file config.yaml
of all experiments in MQBench.
To reproduce LSQ on ResNet18,

enter the directory
$ cd PATHTOPROJECT/qbench_zoo $ cd lsq_experiments/resnet18_4bit_academic

run script
$ sh run.sh
Note that
run.sh
contain some commands that may not be found, the core running command isPYTHONPATH=$PYTHONPATH:../../.. python u m prototype.solver.cls_quant_solver config config.yaml
How to selfimplement a quantization algorithm
All our quantization algorithms are implemented in prototype/quantization/
To implementa a new algorithm, you need to add you quantizer into this directory.
All quantizer are inheritant from QuantizeBase
class. Each QuantizedBase
will have an observer class which is used to estimate/update the quantization range. The observer design is inspired from the Pytorch1.8 repo. Intializing a QuantizeBase
class will also initialize a Observer
class.
The parameters contained for QuantizeBase
and Observer
include：
quant_min, quant_max
, which specify the $N_{min}, N_{max}$ for rounding boundaries.qshcme
, which can betorch.per_tensor_symmetric
,torch.per_channel_symmetric
,torch.per_tensor_affine
, andtorch.per_channel_affine
. This is often determined by the hardware setup.ch_axis
, which is the dimension of channelwise quantization. 1 is for pertensor quantization. Typically fornn.Conv2d
andnn.Linear
module, thech_axis
should be 0.ada_sign
, which can adaptively choose the signness.ada_sign
should be enabled for academic setting only.pot_scale
, which is used to determine the powersoftwo scale parameters.
Note: each specified quantizer may have its own unique parameters, see example of LSQ below.
Example Implementation of LSQ:

For initialization, we add new parameters for storing the scale, zero_point:
self.use_grad_scaling = use_grad_scaling self.scale = Parameter(torch.tensor([scale])) self.zero_point = Parameter(torch.tensor([zero_point]))

The major implementation is the
forward
function, which should contain several cases:
In case of
ada_sign=True
, the quantization range should be adjusted.if self.ada_sign and X.min() >= 0: self.quant_max = self.activation_post_process.quant_max = 2 ** self.bitwidth  1 self.quant_min = self.activation_post_process.quant_min = 0 self.activation_post_process.adjust_sign = True

In case of symmetric quantization, the zero point should set to 0.
self.zero_point.data.zero_()

In case of powersoftwo scale, the scale should be quantized by:
def pot_quantization(tensor: torch.Tensor): log2t = torch.log2(tensor) log2t = (torch.round(log2t)log2t).detach() + log2t return 2 ** log2t scale = pot_quantization(self.scale)

Implement both perchannel and pertensor quantization.

After adding you quantizer...
The next step is to register the quantizer in prototype/quantization/qconfig.py
Import your quantizer and then add it to get_qconfig
function, and parse necessary arguments.
The final step is to override a config.yaml
file:
qparams:
w_method: lsq
a_method: lsq
bit: 4
backend: academic
bnfold: 4
By replacing the w_method, a_method
, you can run your implementation.
Note: the rest of the config file should not be modified in order to keep a unified training setting.
How to selfimplement a hardware configuration
Adding a new setting in hardware is much simpler that algorithms. To do this, we can add another condition in the ifelse
selection. For example, adding a new hardware TFLite Micro:
elif backend == "tflitemicro":
backend_params = dict(ada_sign=False, symmetry=True, per_channel=False, pot_scale=True)
...
model_qconfig = get_qconfig(**self.qparams, **backend_params)
model = quantize_fx.prepare_qat_fx(model, {"": model_qconfig}, foldbn_config)
Submitting Your Results to MQBench
You can submit your implementation to MQBench by submmitting a merge request to this repo. The implementation of new algorithms and the running scripts, log file are needed for evalutation.