2402.13616_YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information¶
YOLOv9是由中国台湾的Academia Sinica和台北科技大学等机构联合开发的。这一版本引入了程序化梯度信息(PGI)和泛化高效层聚合网络(GELAN),解决了深层网络中信息丢失的问题,从而提高了目标检测的准确率
Abstract¶
当前的深度学习方法专注于如何设计最合适的客观函数,以使模型的预测结果尽可能接近真实值。同时,还需要设计一个能够有效获取足够预测信息的合适架构。现有方法忽略了一个事实,即当输入数据经过逐层特征提取和空间变换时,大量的信息将会丢失。本文将深入探讨数据通过深层网络传输过程中数据丢失的重要问题,即信息瓶颈和可逆函数。我们提出了可编程梯度信息(PGI)的概念,以应对深度网络为实现多个目标所需的各类变化。PGI可以为目标任务提供完整的输入信息来计算目标函数,从而获得可靠的梯度信息以更新网络权重。
此外,基于梯度路径规划,我们设计了一种新的轻量级网络架构——广义高效层聚合网络(GELAN)。GELAN的架构验证了PGI在轻量级模型上取得了优越的结果。我们在基于MS COCO数据集的目标检测中验证了所提出的GELAN和PGI。结果显示,GELAN仅使用传统的卷积算子就能实现比基于深度卷积开发的最先进方法更好的参数利用率。PGI可以应用于从轻量级到大型的各种模型。它可以用于获取完整的信息,从而使从头开始训练的模型相比使用大型数据集预训练的最先进模型取得更好的结果,比较结果如图1所示。
Today’s deep learning methods focus on how to design the most appropriate objective functions so that the prediction results of the model can be closest to the ground truth. Meanwhile, an appropriate architecture that can facilitate acquisition of enough information for prediction has to be designed. Existing methods ignore a fact that when input data undergoes layer-by-layer feature extraction and spatial transformation, large amount of information will be lost. This paper will delve into the important issues of data loss when data is transmitted through deep networks, namely information bottleneck and reversible functions. We proposed the concept of programmable gradient information (PGI) to cope with the various changes required by deep networks to achieve multiple objectives. PGI can provide complete input information for the target task to calculate objective function, so that reliable gradient information can be obtained to update network weights. In addition, a new lightweight network architecture – Generalized Efficient Layer Aggregation Network (GELAN), based on gradient path planning is designed. GELAN’s architecture confirms that PGI has gained superior results on lightweight models. We verified the proposed GELAN and PGI on MS COCO dataset based object detection. The results show that GELAN only uses conventional convolution operators to achieve better parameter utilization than the state-of-the-art methods developed based on depth-wise convolution. PGI can be used for variety of models from lightweight to large. It can be used to obtain complete information, so that train-from-scratch models can achieve better results than state-of-the-art models pre-trained using large datasets, the comparison results are shown in Figure 1.