logo

SCIENCE CHINA Information Sciences, Volume 64 , Issue 2 : 120103(2021) https://doi.org/10.1007/s11432-020-2969-8

PSC-Net: learning part spatial co-occurrence for occluded pedestrian detection

More info
  • ReceivedMar 10, 2020
  • AcceptedJun 28, 2020
  • PublishedNov 19, 2020

Abstract

Detecting pedestrians, especially under heavy occlusion, is a challenging computer vision problem with numerous real-world applications. This paper introduces a novel approach, termed as PSC-Net, for occluded pedestrian detection. The proposed PSC-Net contains a dedicated module that is designed to explicitly capture both inter and intra-part co-occurrence information of different pedestrian body parts through a graph convolutional network (GCN). Both inter and intra-part co-occurrence information contribute towards improving the feature representation for handling varying level of occlusions, ranging from partial to severe occlusions. Our PSC-Net exploits the topological structure of pedestrian and does not require part-based annotations or additional visible bounding-box (VBB) information to learn part spatial co-occurrence.Comprehensive experiments are performed on three challenging datasets: CityPersons, Caltech, and CrowdHuman datasets.Particularly, in terms of log-average miss rates and with the same backbone and input scale as those of the state-of-the-art MGAN, the proposed PSC-Net achieves absolute gains of 4.0% and 3.4% over MGAN on the heavy occlusion subsets of CityPersons and Caltech test sets, respectively.


Acknowledgment

This work was supported by National Natural Science Foundation of China (Grant No. 61632018) and National Key RD Program of China (Grant Nos. 2018AAA0102800, 2018AAA0102802).


References

[1] Zhang S, Benenson R, Schiele B. Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. Google Scholar

[2] Zhang Z, Pang Y. CGNet: cross-guidance network for semantic segmentation. Sci China Inf Sci, 2020, 63: 120104 CrossRef Google Scholar

[3] Sun H, Pang Y. GlanceNets - efficient convolutional neural networks with adaptive hard example mining. Sci China Inf Sci, 2018, 61: 109101 CrossRef Google Scholar

[4] Ma S, Pang Y, Pan J. Preserving details in semantics-aware context for scene parsing. Sci China Inf Sci, 2020, 63: 120106 CrossRef Google Scholar

[5] Liu W, Liao S, Hu W, et al. Learning Efficient Single-stage Pedestrian Detectors by Asymptotic Localization Fitting. In: Proceedings of European Conference on Computer Vision, 2018. Google Scholar

[6] Noh J, Lee S, Kim B, et al. Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. Google Scholar

[7] Brazil G, Liu X. Pedestrian Detection With Autoregressive Network Phases. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. Google Scholar

[8] Liu S, Huang D, Wang Y, et al. Adaptive NMS: Refining Pedestrian Detection in a crowd. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. Google Scholar

[9] Zhou C, Yuan J. Bi-box Regression for Pedestrian Detection and Occlusion Estimation. In: Proceedings of European Conference on Computer Vision, 2018. Google Scholar

[10] Cao J, Pang Y, Han J. Taking a Look at Small-Scale Pedestrians and Occluded Pedestrians. IEEE Trans Image Process, 2020, 29: 3143-3152 CrossRef ADS Google Scholar

[11] Cao J, Pang Y, Zhao S. High-Level Semantic Networks for Multi-Scale Object Detection. IEEE Trans Circuits Syst Video Technol, 2019, : 1-1 CrossRef Google Scholar

[12] Zhou C, Yang M, Yuan J, et al. Discriminative Feature Transformation for Occluded Pedestrian Detection. In: Proceedings of IEEE International Conference on Computer Vision, 2019. Google Scholar

[13] Pang Y, Xie J, Khan M, et al. Mask-Guided Attention Network for Occluded Pedestrian Detection. In: Proceedings of IEEE International Conference on Computer Vision, 2019. Google Scholar

[14] Zhang S, Yang J, Schiele B, et al. Occluded Pedestrian Detection Through Guided Attention in CNNs. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. Google Scholar

[15] Zhang S, Wen L, Bian X, et al. Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd. In: Proceedings of European Conference on Computer Vision, 2018. Google Scholar

[16] Brazil G, Xi Y, Liu X. Illuminating Pedestrians via Simultaneous Detection and Segmentation. In: Proceedings of IEEE International Conference on Computer Vision, 2017. Google Scholar

[17] Wang X, Xiao T, Jiang Y, et al. Repulsion Loss: Detecting Pedestrians in a Crowd. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. Google Scholar

[18] Mao J, Xiao T, Jiang Y, et al. What Can Help Pedestrian Detection? In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. Google Scholar

[19] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of Conference and Workshop on Neural Information Processing Systems, 2015. Google Scholar

[20] Tian Y, Luo P, Wang X, et al. Deep Learning Strong Parts for Pedestrian Detection. In: Proceedings of IEEE International Conference on Computer Vision, 2015. Google Scholar

[21] Zhou C, Yuan J. Multi-label Learning of Part Detectors for Heavily Occluded Pedestrian Detection. In: Proceedings of IEEE International Conference on Computer Vision, 2017. Google Scholar

[22] Ouyang W, Wang X.Joint Deep Learning for Pedestrian Detection. In: Proceedings of IEEE International Conference on Computer Vision, 2013. Google Scholar

[23] Mathias M, Benenson R, Timofte R, et al. Handling Occlusions with Franken-classifiers. In: Proceedings of IEEE International Conference on Computer Vision, 2013. Google Scholar

[24] Ouyang W, Zeng, X, Wang X. Modeling Mutual Visibility Relationship in Pedestrian Detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2013. Google Scholar

[25] Mikolajczyk K, Schmid C, Zisserman A. Human Detection Based on a Probabilistic Assembly of Robust Part Detectors. In: Proceedings of European Conference on Computer Vision, 2004. Google Scholar

[26] Mohan A, Papageorgiou C, Poggio T. Example-based object detection in images by components. IEEE Trans Pattern Anal Machine Intell, 2001, 23: 349-361 CrossRef Google Scholar

[27] Zhou C, Yuan J. Learning to Integrate Occlusion-Specific Detectors for Heavily Occluded Pedestrian Detection. In: Proceedings of Asian Conference on Computer Vision, 2016. Google Scholar

[28] Biederman I, Mezzanotte R J, Rabinowitz J C. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 1982, 14: 143-177 CrossRef Google Scholar

[29] Bar M, Ullman S. Spatial Context in Recognition. Perception, 1996, 25: 343-352 CrossRef Google Scholar

[30] Galleguillos C, Rabinovich A, Belongie S. Object Categorization using Co-Occurrence, Location and Appearance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2008. Google Scholar

[31] Cai Z, Fan Q, Feris R, et al. A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection. In: Proceedings of European Conference on Computer Vision, 2016. Google Scholar

[32] Kipf T, Welling M. Semi-supervised classification with graph convolutional networks. In: Proceedings of International Conference on Learning Representations, 2017. Google Scholar

[33] Li Q, Han Z, Wu X. Deeper insights into graph convolutional networks for semi-supervised learning. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018. Google Scholar

[34] Dollar P, Wojek C, Schiele B. Pedestrian Detection: An Evaluation of the State of the Art. IEEE Trans Pattern Anal Mach Intell, 2012, 34: 743-761 CrossRef Google Scholar

[35] Shao S, Zhao Z, Li B. CrowdHuman: A Benchmark for Detecting Human in a Crowd. 2018,. arXiv Google Scholar

[36] Kingma D, Ba J. Adam: A Method for Stochastic Optimization. In: Proceedings of International Conference on Learning Representations, 2014. Google Scholar

[37] Karen S, Andrew Z. Very deep convolutional networks for large-scale image recognition. 2014,. arXiv Google Scholar

[38] Song T, Sun L, Xie D, et al. Small-scale Pedestrian Detection Based on Topological Line Localization and Temporal Feature Aggregation. In: Proceedings of European Conference on Computer Vision, 2018. Google Scholar

[39] Liu W, Liao S, Ren W, et al. High-level semantic feature detection: a new perspective for pedestrian detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. Google Scholar

[40] Cai Z, Vasconcelos N. Cascade R-CNN: high quality object detection and instance segmentation. 2019,. arXiv Google Scholar

[41] Cai Z, Saberian M, Vasconcelos N. Learning complexity-aware cascades for deep pedestrian detection. In: Proceedings of IEEE International Conference on Computer Vision, 2015. Google Scholar

[42] Cao J, Pang Y, Li X. Learning Multilayer Channel Features for Pedestrian Detection. IEEE Trans Image Process, 2017, 26: 3210-3220 CrossRef ADS arXiv Google Scholar

[43] Li J, Liang X, Shen S M. Scale-aware Fast R-CNN for Pedestrian Detection. IEEE Trans Multimedia, 2017, : 1-1 CrossRef Google Scholar

[44] Zhang L, Lin L, Liang X, et al. Is Faster R-CNN doing well for pedestrian detection? In: Proceedings of European Conference on Computer Vision, 2016. Google Scholar

[45] Lin C, Lu J, Wang G., et al. Graininess-Aware Deep Feature Learning for Pedestrian Detection. In: Proceedings of European Conference on Computer Vision, 2018. Google Scholar