2405.14458_YOLOv10: Real-Time End-to-End Object Detection
#########################################################

* https://arxiv.org/abs/2405.14458
* GitHub: https://github.com/THU-MIG/yolov10
* YOLOv10是清华大学的研究人员在Ultralytics的基础上，引入了一种新的实时目标检测方法，解决了YOLO 以前版本在后处理和模型架构方面的不足。通过消除非最大抑制（NMS）和优化各种模型组件，YOLOv10 在显著降低计算开销的同时实现了最先进的性能。大量实验证明，YOLOv10 在多个模型尺度上实现了卓越的精度-延迟权衡。


Abstract
========


* 过去几年中，由于在计算成本和检测性能之间实现了有效的平衡，YOLO系列已成为实时对象检测领域的主导范式。研究人员探索了YOLO的架构设计、优化目标、数据增强策略等，取得了显著进展。然而，对非极大值抑制（NMS）进行后处理的依赖阻碍了YOLO的端到端部署，并对推理延迟产生了负面影响。此外，YOLO中各种组件的设计缺乏全面彻底的审视，导致明显的计算冗余，限制了模型的能力，造成了效率欠佳以及性能提升的巨大潜力。

* 在这项工作中，我们旨在从后处理和模型架构两个方面进一步推进YOLO的性能-效率边界。为此，我们首先提出了无NMS训练YOLO的一致双重分配方案，这同时带来了竞争性的性能和低推理延迟。此外，我们引入了一种整体的效率-准确性驱动的YOLO模型设计策略。我们从效率和准确性的角度全面优化了YOLO的各个组件，大大降低了计算开销并增强了模型能力。我们的努力成果是新一代用于实时端到端对象检测的YOLO系列，命名为YOLOv10。广泛的实验表明，YOLOv10在各种模型规模上都实现了最先进的性能和效率。例如，在COCO上与RT-DETR-R18具有相似AP的情况下，我们的YOLOv10-S速度快1.8倍，同时参数数量和FLOPs减少了2.8倍。与YOLOv9-C相比，YOLOv10-B在相同性能下延迟降低了46%，参数减少了25%。


* Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-time object detection owing to their effective balance between computational cost and detection performance. Researchers have explored the architectural designs, optimization objectives, data augmentation strategies, and others for YOLOs, achieving notable progress. However, the reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs and adversely impacts the inference latency. Besides, the design of various components in YOLOs lacks the comprehensive and thorough inspection, resulting in noticeable computational redundancy and limiting the model's capability. It renders the suboptimal efficiency, along with considerable potential for performance improvements. In this work, we aim to further advance the performance-efficiency boundary of YOLOs from both the post-processing and model architecture. To this end, we first present the consistent dual assignments for NMS-free training of YOLOs, which brings competitive performance and low inference latency simultaneously. Moreover, we introduce the holistic efficiency-accuracy driven model design strategy for YOLOs. We comprehensively optimize various components of YOLOs from both efficiency and accuracy perspectives, which greatly reduces the computational overhead and enhances the capability. The outcome of our effort is a new generation of YOLO series for real-time end-to-end object detection, dubbed YOLOv10. Extensive experiments show that YOLOv10 achieves state-of-the-art performance and efficiency across various model scales. For example, our YOLOv10-S is 1.8× faster than RT-DETR-R18 under the similar AP on COCO, meanwhile enjoying 2.8× smaller number of parameters and FLOPs. Compared with YOLOv9-C, YOLOv10-B has 46\% less latency and 25\% fewer parameters for the same performance.