This repo is the implementation of “TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios”.
On VisDrone Challenge 2021, TPH-YOLOv5 wins 4th place and achieves well-matched results with 1st place model.
You can get VisDrone-DET2021: The Vision Meets Drone Object Detection Challenge Results for more information.


$ git clone
$ cd tph-yolov5
$ pip install -r requirements.txt

Convert labels transfer VisDrone annotiations to yolo labels.
You should set the path of VisDrone dataset in first.

$ python

Inference runs inference on VisDrone2019-DET-val, using weights trained with TPH-YOLOv5.
(We provide two weights trained by two different models based on YOLOv5l.)

$ python --weights ./weights/ --img 1996 --data ./data/VisDrone.yaml
--augment --save-txt  --save-conf --task val --batch-size 8 --verbose --name v5l-xs



If you inference dataset with different models, then you can ensemble the result by weighted boxes fusion using
You should set img path and txt path in

$ python

Train allows you to train new model from strach.

$ python --img 1536 --batch 2 --epochs 80 --data ./data/VisDrone.yaml --weights --hy data/hyps/hyp.VisDrone.yaml --cfg models/yolov5l-xs-tr-cbam-spp-bifpn.yaml --name v5l-xs


Description of TPH-yolov5 and citation

If you have any question, please discuss with me by sending email to [email protected]
If you find this code useful please cite:

  title={TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios},
  author={Zhu, Xingkui and Lyu, Shuchang and Wang, Xu and Zhao, Qi},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},


Thanks to their great works


View Github