2004.10934_YOLOv4: Optimal Speed and Accuracy of Object Detection¶
Abstract¶
有许多特征据说可以提升卷积神经网络(CNN)的准确性。需要在大型数据集上对这些特征的组合进行实际测试,并对结果进行理论上的论证。一些特征仅在特定模型和特定问题上,或仅在小规模数据集上起作用;而一些特征,如批量归一化和残差连接,适用于大多数模型、任务和数据集。我们假设这样的通用特性包括加权残差连接(WRC)、跨阶段部分连接(CSP)、跨小批量归一化(CmBN)、自对抗训练(SAT)以及Mish激活函数。我们使用了新的特性:WRC、CSP、CmBN、SAT、Mish激活、马赛克数据增强、CmBN、DropBlock正则化和CIoU损失,并结合其中的一些特性以达到最先进的结果:在MS COCO数据集上实现了43.5%的AP(65.7% AP50),在Tesla V100上以约65 FPS的实时速度运行。
There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100.