1612.08242_YOLO9000: Better, Faster, Stronger¶
Abstract¶
我们介绍了YOLO9000,这是一个最先进的实时对象检测系统,能够检测超过9000个对象类别。首先,我们对YOLO检测方法提出了多种改进,既包括新颖的改进也包括从前人工作中借鉴的改进。改进后的模型,即YOLOv2,在PASCAL VOC和COCO等标准检测任务中达到了最先进的水平。在67 FPS(每秒帧数)的速度下,YOLOv2在VOC 2007上达到了76.8的mAP(平均精度)。在40 FPS时,YOLOv2的mAP达到了78.6,超越了如带有ResNet的Faster RCNN和SSD等最先进的方法,同时仍然显著更快。
最后,我们提出了一种可以同时在对象检测和分类上进行训练的方法。利用这种方法,我们在COCO检测数据集和ImageNet分类数据集上同时训练YOLO9000。我们的联合训练使YOLO9000能够预测那些没有标记检测数据的对象类别的检测结果。我们在ImageNet检测任务上验证了我们的方法。尽管只有44个类别具有检测数据,YOLO9000在ImageNet检测验证集上仍达到了19.7的mAP。在COCO未包含的156个类别中,YOLO9000达到了16.0的mAP。但YOLO能检测的不仅仅是这200个类别;它可以预测超过9000个不同对象类别的检测结果。而且它依然能实现实时运行。
We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. At 67 FPS, YOLOv2 gets 76.8 mAP on VOC 2007. At 40 FPS, YOLOv2 gets 78.6 mAP, outperforming state-of-the-art methods like Faster RCNN with ResNet and SSD while still running significantly faster. Finally we propose a method to jointly train on object detection and classification. Using this method we train YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset. Our joint training allows YOLO9000 to predict detections for object classes that don’t have labelled detection data. We validate our approach on the ImageNet detection task. YOLO9000 gets 19.7 mAP on the ImageNet detection validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 gets 16.0 mAP. But YOLO can detect more than just 200 classes; it predicts detections for more than 9000 different object categories. And it still runs in real-time.