2207.02696_YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
#####################################################################################################


* https://arxiv.org/abs/2207.02696
* GitHub: https://github.com/WongKinYiu/yolov7


Abstract
========


* YOLOv7在从5 FPS到160 FPS的范围内，在速度和准确性上都超越了所有已知的对象检测器，并且在GPU V100上以30 FPS或更高的速度运行时，具有所有已知实时对象检测器中最高的准确率56.8% AP。YOLOv7-E6对象检测器（V100上56 FPS，55.9% AP）在速度上比基于transformer的检测器SWIN-L Cascade-Mask R-CNN（A100上9.2 FPS，53.9% AP）快509%，准确率高出2%；相比基于卷积的检测器ConvNeXt-XL Cascade-Mask R-CNN（A100上8.6 FPS，55.2% AP），速度提升551%，准确率提高0.7% AP。此外，YOLOv7在速度和准确性方面也优于YOLOv5、DETR、可变形DETR、DINO-5scale-R50、ViT-Adapter-B等众多其他对象检测器。更重要的是，我们仅使用MS COCO数据集从头开始训练YOLOv7，没有使用任何其他数据集或预训练权重。

* YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. YOLOv7-E6 object detector (56 FPS V100, 55.9% AP) outperforms both transformer-based detector SWIN-L Cascade-Mask R-CNN (9.2 FPS A100, 53.9% AP) by 509% in speed and 2% in accuracy, and convolutional-based detector ConvNeXt-XL Cascade-Mask R-CNN (8.6 FPS A100, 55.2% AP) by 551% in speed and 0.7% AP in accuracy, as well as YOLOv7 outperforms: YOLOR, YOLOX, Scaled-YOLOv4, YOLOv5, DETR, Deformable DETR, DINO-5scale-R50, ViT-Adapter-B and many other object detectors in speed and accuracy. Moreover, we train YOLOv7 only on MS COCO dataset from scratch without using any other datasets or pre-trained weights.