logo

SCIENTIA SINICA Informationis, Volume 49, Issue 10: 1369-1382(2019) https://doi.org/10.1360/N112018-00133

Exploring the defects of the average precision and its influence

More info
  • ReceivedMay 25, 2018
  • AcceptedDec 17, 2018
  • PublishedOct 15, 2019

Abstract

The average precision (AP) is an important evaluation index for object detection algorithms in computer vision and has long been used for quantitative comparisons of relevant research results. In practice, we found that the AP index has defects, i.e., it does not accurately measure the quantity accuracy of object detection algorithms. We particularly analyze two manifestations of the AP defects. First, in the AP definition and calculation, there is no change in the recall rate when there is a false alarm and the precision rate decreases. Consequently, one recall rate may correspond to multiple precision rates. The AP value only corresponds to the maximum precision at a certain recall rate; therefore, false alarms reducing the precision are ignored or hidden by the AP. Second, in a high recall rate area, the AP is more sensitive to the increase in the recall rate than that in the precision rate, indicating that AP may favor detection algorithms that increase the number of false alarms. AP defects are very harmful. First, there is a possibility that researchers will artificially adjust the threshold value to obtain a high AP value. Second, a research method that can improve the object detection quantity accuracy may be suppressed, which will hinder development in the field. Finally, this situation may result in a serious divergence between academic research pursuing high AP values and practical applications that require quantitative accuracy. This may result in a failure to use many academic studies in practical applications. Based on the above analysis, we call for a critical improvement in the AP index to bridge the gap between academic research and practical applications, promoting positive interactions between technological innovation and industry use.


Funded by

国家重点研发计划(2018YFB1003405)

国家自然科学基金(61732018)


References

[1] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436-444 CrossRef PubMed ADS Google Scholar

[2] Krizhevsky A, Sutskever I, Hinton G. Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2012. 1097--1105. Google Scholar

[3] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014,. arXiv Google Scholar

[4] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770--778. Google Scholar

[5] Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. Google Scholar

[6] Chen Y P, Li J N, Xiao H X, et al. Dual path networks. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 4467--4475. Google Scholar

[7] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. Google Scholar

[8] Sabour S, Frosst N, Hinton G. Dynamic routing between capsules. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 3856--3866. Google Scholar

[9] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2014. 580--587. Google Scholar

[10] Girshick R. Fast R-CNN. In: Proceedings of IEEE International Conference on Computer Vision, 2015. 1440--1448. Google Scholar

[11] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 91--99. Google Scholar

[12] Dai J F, Li Y, He K M, et al. R-FCN: object detection via region-based fully convolutional networks. In: Proceedings of Advances in Neural Information Processing Systems, 2016. 379--387. Google Scholar

[13] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 779--788. Google Scholar

[14] Liu W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector. In: Proceedings of European Conference on Computer Vision, 2016. 21--37. Google Scholar

[15] Wan L, Eigen D, Fergus R. End-to-end integration of a convolution network, deformable parts model and non-maximum suppression. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. 851--859. Google Scholar

[16] Cao J, Pang Y, Li X. Learning multilayer channel features for pedestrian detection. IEEE transactions on image processing, 2017, 26: 3210-20, doi: 10.1109/TIP.2017.2694224. Google Scholar

[17] Lin T, Dollár P, Girshick R, et al. Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2117--2125. Google Scholar

[18] Redmon J, Farhadi A. YOLO9000: better, faster, stronger. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. Google Scholar

[19] He K, Gkioxari G, Dollár P, et al. Mask R-CNN. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 2980--2988. Google Scholar

[20] Wang W, Shen J, Shao L. Video Salient Object Detection via Fully Convolutional Networks. IEEE Trans Image Process, 2018, 27: 38-49 CrossRef ADS arXiv Google Scholar

[21] Wang W, Shen J, Yang R. Saliency-Aware Video Object Segmentation.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 20-33 CrossRef PubMed Google Scholar

[22] Wang W, Shen J. Deep Visual Attention Prediction. IEEE Trans Image Process, 2018, 27: 2368-2378 CrossRef PubMed ADS arXiv Google Scholar

[23] Chen S, Fern A, Todorovic S. Person count localization in videos from noisy foreground and detections. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. 1364--1372. Google Scholar

[24] Qui nonero-Candela J, Dagan I, Magnini B, et al. Machine Learning Challenges: Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment. Berlin: Springer, 2006. Google Scholar

[25] Stewart R, Andriluka M, Ng A. End-to-end people detection in crowded scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2325--2333. Google Scholar

[26] Everingham M, Van Gool L, Williams C, et al. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html. Google Scholar

[27] Everingham M, Van Gool L, Williams C, et al. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html. Google Scholar

[28] Lin T, Maire M, Belongie S, et al. Microsoft COCO: common objects in context. In: Proceedings of European Conference on Computer Vision, 2014. 740--755. Google Scholar

[29] Andreas G, Philip L, Raquel U. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2012. 3354--3361. Google Scholar

[30] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2005. 886--893. Google Scholar

[31] Hoiem D, Chodpathumwan Y, Dai Q Y. Diagnosing error in object detectors. In: Proceedings of European Conference on Computer Vision, 2012. 340--353. Google Scholar

[32] Christopher D, Prabhakar R, Hinrich S. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008. Google Scholar

[33] Felzenszwalb P, McAllester D, Ramanan D. A discriminatively trained, multiscale, deformable part model. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2008. Google Scholar

[34] Uijlings J R R, van de Sande K E A, Gevers T. Selective Search for Object Recognition. Int J Comput Vis, 2013, 104: 154-171 CrossRef Google Scholar

[35] Hosang J, Benenson R, Schiele B. Learning non-maximum suppression. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 4507--4515. Google Scholar

[36] Hu H, Gu J Y, Zhang Z, et al. Relation networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 3588--3597. Google Scholar

[37] Bodla N, Singh B, Chellappa R, et al. Improving object detection with one line of code. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 5561--5569. Google Scholar

  • Figure 2

    (Color online) Flow chart of object detection methods based on convolutional neural networks

  • Figure 3

    (Color online) Detection result of sample 11-13-27000 in Brainwash testset. Green stars indicate TPs. Red circles indicate FPs. Blue curve indicates the precision curve. Black curve indicates the recall curve.

  • Figure 4

    (Color online) (a) Precision-recall curve (blue) and interpolated precision-recall curve (red); (b) interpolated precision-recall curve for 11 recall levels

  • Figure 5

    (Color online) (a) Detection result of sample 11-13-27000 in Brainwash testset using Faster R-CNN; (b) the precision-recall curve of the detection result in (a)

  • Figure 6

    (Color online) The precision-recall curve of person class in VOC testset in recent years

  • Figure 7

    (Color online) Diagram of how AP hides the false alarms in the computation process. (a) and (b) show the detection results of method A (AP = 0.69) and B (AP = 0.67) respectively. Green circles indicate the instances. Boxes indicate the detection results. The black boxes indicate the true positives, while the red ones indicate the false positives (i.e., the false alarms). (c) Precision-recall curve. Blue dotted line is the interpolated precision-recall curve used in AP calculation.

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1       京公网安备11010102003388号