YOLOv5-Lite:lighter, faster and easier to deploy

image

Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, and fewer parameters) and faster (add shuffle channel, yolov5 head for channel reduce. It can infer at least 10+ FPS On the Raspberry Pi 4B when input the frame with 320×320) and is easier to deploy (removing the Focus layer and four slice operations, reducing the model quantization accuracy to an acceptable range).

Comparison of ablation experiment results

ID Model Input_size Flops Params Size(M) [email protected] [email protected]:0.95
001 yolo-fastest 320×320 0.25G 0.35M 1.4 24.4
002 nanodet-m 320×320 0.72G 0.95M 1.8 20.6
003 yolo-fastest-xl 320×320 0.72G 0.92M 3.5 34.3
004 YOLOv5-Litesours 320×320 0.97G 1.54M 3.3 36.1 20.9
005 yolov3-tiny 416×416 6.96G 6.06M 23.0 33.1 16.6
006 yolov4-tiny 416×416 5.62G 8.86M 33.7 40.2 21.7
007 YOLOv5-Litesours 416×416 1.63G 1.54M 3.3 41.3 24.3
008 YOLOv5-Litecours 640×640 8.6G 4.37M 9.2 52.5 33.0
009 YOLOv5s 640×640 17.0G 7.3M 14.2 55.8 35.9
010 YOLOv5-Litegours 640×640 15.7G 5.3M 10.9 56.9 38.1

Comparison on different platforms

Equipment Computing backend System Input Framework v5Lite-s v5Lite-c v5Lite-g YOLOv5s
Inter @i5-10210U window(x86) 640×640 openvino 46ms 131ms
Nvidia @RTX 2080Ti Linux(x86) 640×640 torch 15ms 14ms
Redmi K30 @Snapdragon 730G Android(arm64) 320×320 ncnn 28ms 163ms
Raspberrypi 4B @ARM Cortex-A72 Linux(arm64) 320×320 ncnn 84ms 371ms
Raspberrypi 4B @ARM Cortex-A72 Linux(arm64) 320×320 mnn 76ms 356ms
  • The above is a 4-thread test benchmark
  • Raspberrypi 4B enable bf16s optimization,Raspberrypi 64 Bit OS

qq交流群:993965802

·Model Zoo·

@YOLOv5-Lites:

Model Size Backbone Head Framework Design for
v5Lite-s.pt 3.3m shufflenetv2(Megvii) v5Lites-head Pytorch Arm-cpu
v5Lite-s.bin
v5Lite-s.param
3.3m shufflenetv2 v5Lites-head ncnn Arm-cpu
v5Lite-s-int8.bin
v5Lite-s-int8.param
1.7m shufflenetv2 v5Lites-head ncnn Arm-cpu
v5Lite-s.mnn 3.3m shufflenetv2 v5Lites-head mnn Arm-cpu
v5Lite-s-int4.mnn 987k shufflenetv2 v5Lites-head mnn Arm-cpu
v5Lite-s-fp16.bin
v5Lite-s-fp16.xml
3.4m shufflenetv2 v5Lites-head openvivo x86-cpu
v5Lite-s-fp32.bin
v5Lite-s-fp32.xml
6.8m shufflenetv2 v5Lites-head openvivo x86-cpu
v5Lite-s-fp16.tflite 3.3m shufflenetv2 v5Lites-head tflite arm-cpu
v5Lite-s-fp32.tflite 6.7m shufflenetv2 v5Lites-head tflite arm-cpu
v5Lite-s-int8.tflite 1.8m shufflenetv2 v5Lites-head tflite arm-cpu

@YOLOv5-Litec:

Model Size Backbone Head Framework Design for
v5Lite-c.pt 9m PPLcnet(Baidu) v5Litec-head Pytorch x86-cpu / x86-vpu
v5Lite-c.bin
v5Lite-c.xml
8.7m PPLcnet v5Litec-head openvivo x86-cpu / x86-vpu

@YOLOv5-Liteg:

Model Size Backbone Head Framework Design for
v5Lite-g.pt 10.9m Repvgg(Tsinghua) v5Liteg-head Pytorch x86-gpu / arm-gpu / arm-npu
v5Lite-g-int8.engine 8.5m Repvgg v5Liteg-head Tensorrt x86-gpu / arm-gpu / arm-npu
v5lite-g-int8.tmfile 8.7m Repvgg v5Liteg-head Tengine arm-npu

Download Link:

|──────ncnn-fp16: | Baidu Drive | Google Drive |
|──────ncnn-int8: | Baidu Drive | Google Drive |
|──────mnn-fp16: | Baidu Drive | Google Drive |
|──────mnn-int4: | Baidu Drive | Google Drive |
└──────tengine-fp32: | Baidu Drive | Google Drive |

└──────openvino-fp16: | Baidu Drive | Google Drive |

Baidu Drive Password: pogg

v5lite-s model: TFLite Float32, Float16, INT8, Dynamic range quantization, ONNX, TFJS, TensorRT, OpenVINO IR FP32/FP16, Myriad Inference Engin Blob, CoreML

https://github.com/PINTO0309/PINTO_model_zoo/tree/main/180_YOLOv5-Lite

Thanks for PINTO0309:https://github.com/PINTO0309/PINTO_model_zoo/tree/main/180_YOLOv5-Lite

How to use

Install

Python>=3.6.0 is required with all
requirements.txt installed including
PyTorch>=1.7:

$ git clone https://github.com/ppogg/YOLOv5-Lite
$ cd YOLOv5-Lite
$ pip install -r requirements.txt
Inference with detect.py

detect.py runs inference on a variety of sources, downloading models automatically from
the latest YOLOv5-Lite release and saving results to runs/detect.

$ python detect.py --source 0  # webcam
                            file.jpg  # image 
                            file.mp4  # video
                            path/  # directory
                            path/*.jpg  # glob
                            'https://youtu.be/NUsoVlDFqZg'  # YouTube
                            'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP stream
Training

$ python train.py --data coco.yaml --cfg v5lite-s.yaml --weights v5lite-s.pt --batch-size 128
                                         v5lite-c.yaml           v5lite-c.pt               96
                                         v5lite-g.yaml           v5lite-g.pt               64

If you use multi-gpu. It’s faster several times:

$ python -m torch.distributed.launch --nproc_per_node 2 train.py
DataSet

Training set and test set distribution (the path with xx.jpg)

train: ../coco/images/train2017/
val: ../coco/images/val2017/

├── images            # xx.jpg example
│   ├── train2017        
│   │   ├── 000001.jpg
│   │   ├── 000002.jpg
│   │   └── 000003.jpg
│   └── val2017         
│       ├── 100001.jpg
│       ├── 100002.jpg
│       └── 100003.jpg
└── labels             # xx.txt example      
    ├── train2017       
    │   ├── 000001.txt
    │   ├── 000002.txt
    │   └── 000003.txt
    └── val2017         
        ├── 100001.txt
        ├── 100002.txt
        └── 100003.txt
model hub

Here, the original components of YOLOv5 and the reproduced components of YOLOv5-Lite are organized and stored in the model hub

modelhub

Updating …

How to deploy

ncnn for arm-cpu

mnn for arm-cpu

openvino x86-cpu or x86-vpu

tensorrt for arm-gpu or arm-npu or x86-gpu

Android for arm-cpu

Android_demo

This is a Redmi phone, the processor is Snapdragon 730G, and yolov5-lite is used for detection. The performance is as follows:

link: https://github.com/ppogg/YOLOv5-Lite/tree/master/ncnn_Android

Android_v5Lite-s: https://drive.google.com/file/d/1CtohY68N2B9XYuqFLiTp-Nd2kuFWgAUR/view?usp=sharing

Android_v5Lite-g: https://drive.google.com/file/d/1FnvkWxxP_aZwhi000xjIuhJ_OhqOUJcj/view?usp=sharing

More detailed explanation

Detailed model link:

[1] https://zhuanlan.zhihu.com/p/400545131

[2] https://zhuanlan.zhihu.com/p/410874403

[3] https://blog.csdn.net/weixin_45829462/article/details/119787840

[4] https://zhuanlan.zhihu.com/p/420737659

Reference

https://github.com/ultralytics/yolov5

https://github.com/megvii-model/ShuffleNet-Series

https://github.com/Tencent/ncnn

GitHub

View Github