1506.02640_You Only Look Once: Unified, Real-Time Object Detection¶
Abstract¶
我们提出了YOLO,一种新的目标检测方法。先前的目标检测工作是将分类器重新用于执行检测任务。相反,我们将目标检测视为一个回归问题,以分离空间的边界框和相关的类别概率。单个神经网络可以直接从完整图像中预测边界框和类别概率,只需一次评估。由于整个检测流程是一个单一的网络,因此它可以针对检测性能直接端到端优化。
我们统一的架构非常快速。基础版YOLO模型能够以每秒45帧的速度实时处理图像。网络的一个较小版本,Fast YOLO,能以惊人的每秒155帧的速度处理图像,同时仍然达到其他实时检测器两倍的mAP(平均精度)。与最先进的检测系统相比,YOLO虽然会犯更多的定位错误,但它预测不存在物体位置的假阳性情况要少得多。最后,YOLO学习到了关于物体非常通用的表示形式。无论是在自然图像还是在艺术品上,当从Picasso数据集和People-Art数据集进行推广时,它以较大的优势超越了包括DPM和R-CNN在内的所有其他检测方法。
We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance.
Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is far less likely to predict false detections where nothing exists. Finally, YOLO learns very general representations of objects. It outperforms all other detection methods, including DPM and R-CNN, by a wide margin when generalizing from natural images to artwork on both the Picasso Dataset and the People-Art Dataset.