Masked Distillation with Receptive Tokens (MasKD)

Official implementation of paper “Masked Distillation with Receptive Tokens” (MasKD).

By Tao Huang, Yuan Zhang, Shan You, Fei Wang, Chen Qian, Jian Cao, Chang Xu.

🔥 MasKD: better and more general feature distillation method for dense prediction tasks (e.g., detection and segmentation).



May 30, 2022

Code for mask learning and KD is available in mmdetection and mmrazor folders.

Reproducing our results

Train students with pretrained masks

We provide the learned pretrained mask tokens in our experiments at release.

This repo uses MMRazor as the knowledge distillation toolkit. For environment setup, please see mmrazor/

Train student:

cd mmrazor
sh tools/mmdet/ ${CONFIG} 8 ${WORK_DIR}

Example for reproducing our cascade_mask_rcnn_x101-fpn_r50 result:

sh tools/mmdet/ configs/distill/maskd/ 8 work_dirs/maskd_cmr_x101-fpn_x50


  • Baseline settings:

    Student Teacher MasKD Config Log
    Faster RCNN-R50 (38.4) Faster RCNN-R101 (39.8) 40.6
    RetinaNet-R50 (37.4) RetinaNet-R101 (38.9) 39.9
    FCOS-R50 (38.5) FCOS-R101 (40.8) 42.2 config log
  • Stronger teachers:

    Student Teacher MasKD Config Log
    Faster RCNN-R50 (38.4) Cascade Mask RCNN-X101 (45.6) 42.4 config log
    RetinaNet-R50 (37.4) RetinaNet-X101 (41.0) 40.6
    RepPoints-R50 (38.6) RepPoints-R101 (44.2) 41.4 config log

Learning masks

You can train your own mask tokens with the code provided in mmdetection folder. Please check mmdetection/ for detailed instructions.

Semantic segmentation

For semantic segmentation, please see segmentation folder.


This project is released under the Apache 2.0 license.


  title = {Masked Distillation with Receptive Tokens},
  author = {Huang, Tao and Zhang, Yuan and You, Shan and Wang, Fei and Qian, Chen and Cao, Jian and Xu, Chang},
  journal = {arXiv preprint arXiv:2205.14589},
  year = {2022}


View Github