logo

SCIENCE CHINA Information Sciences, Volume 61, Issue 5: 051101(2018) https://doi.org/10.1007/s11432-017-9189-6

Survey of recent progress in semantic image segmentation with CNNs

More info
  • ReceivedMar 18, 2017
  • AcceptedJul 20, 2017
  • PublishedNov 17, 2017

Abstract

In recent years, convolutional neural networks (CNNs) are leading the way in many computer vision tasks, such as image classification, object detection, and face recognition. In order to produce more refined semantic image segmentation, we survey the powerful CNNs and novel elaborate layers, structures and strategies, especially including those that have achieved the state-of-the-art results on the Pascal VOC 2012 semantic segmentation challenge.Moreover, we discuss their different working stages and various mechanisms to utilize the structural and contextual information in the image and feature spaces. Finally, combining some popular underlying referential methods in homologous problems, we propose several possible directions and approaches to incorporate existing effective methods as components to enhance CNNs for the segmentation of specific semantic objects.


Acknowledgment

This work was supported by National High-tech RD Program of China (863 Program) (Grant No. 2015AA016403) and National Natural Science Foundation of China (Grant Nos. 61572061, 61472020).


References

[1] Liang G, Ca J, Liu X. Smart world: a better world. Sci China Inf Sci, 2016, 59: 043401 CrossRef Google Scholar

[2] Wang J, Lu Y, Liu J. A robust three-stage approach to large-scale urban scene recognition. Sci China Inf Sci, 2017, 60: 103101 CrossRef Google Scholar

[3] Wang W, Hu L, Hu Z. Energy-based multi-view piecewise planar stereo. Sci China Inf Sci, 2017, 60: 032101 CrossRef Google Scholar

[4] Hoiem D, Efros A A, Hebert M. Recovering Surface Layout from an Image. Int J Comput Vis, 2007, 75: 151-172 CrossRef Google Scholar

[5] Saxena A, Min Sun A, Ng A Y. Make3D: learning 3D scene structure from a single still image.. IEEE Trans Pattern Anal Mach Intell, 2009, 31: 824-840 CrossRef PubMed Google Scholar

[6] Gould S, Fulton R, Koller D. Decomposing a scene into geometric and semantically consistent regions. In: Proceedings of the IEEE International Conference on Computer Vision, Kyoto, 2009. 1--8. Google Scholar

[7] Gupta A, Efros A A, Hebert M. Blocks world revisited: image understanding using qualitative geometry and mechanics. In: Proceedings of European Conference on Computer Vision, Crete, 2010. 482--496. Google Scholar

[8] Zhao Y B, Zhu S C. Image parsing via stochastic scene grammar. In: Proceedings of the Conference and Workshop on Neural Information Processing System, Granada, 2011. 73--81. Google Scholar

[9] Ce Liu , Yuen J, Torralba A. Nonparametric Scene Parsing via Label Transfer.. IEEE Trans Pattern Anal Mach Intell, 2011, 33: 2368-2382 CrossRef PubMed Google Scholar

[10] Stella X Y, Zhang H, Malik J. Inferring spatial layout from a single image via depth-ordered grouping. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Anchorage, 2008. Google Scholar

[11] Lee D C, Hebert M, Kanade T. Geometric reasoning for single image structure recovery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 2136--2143. Google Scholar

[12] Zheng Y, Byeungwoo J, Xu D, et al. Image segmentation by generalized hierarchical fuzzy C-means algorithm. J Intell Fuzzy Syst, 2015, 28: 4024--4028. Google Scholar

[13] Liu C, Yuen J, Torralba A. SIFT flow: dense correspondence across scenes and its applications. IEEE Trans Softw Eng, 2010, 33: 978--994. Google Scholar

[14] Papandreou G, Chen L C, Murphy K P, et al. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1742--1750. Google Scholar

[15] Ghiasi G, Fowlkes C C. Laplacian pyramid reconstruction and refinement for semantic segmentation. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 519--534. Google Scholar

[16] Peng C, Zhang X Y, Yu G, et al. Large kernel matters---improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017. 4353--4361. Google Scholar

[17] Everingham M, Van Gool L, Williams C K I. The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vis, 2010, 88: 303-338 CrossRef Google Scholar

[18] Zheng S, Jayasumana S, Romera-Paredes B, et al. Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1529--1537. Google Scholar

[19] Lin G S, Shen C H, van den Hengel A, et al. Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3194--3203. Google Scholar

[20] Liu Z W, Li X X, Luo P, et al. Semantic image segmentation via deep parsing network. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1377--1385. Google Scholar

[21] Lin G S, Shen C H, Reid I, et al. Deeply learning the messages in message passing inference. Comput Sci, 2015, 71: 866--872. Google Scholar

[22] Shuai B, Zuo Z, Wang B, et al. Dag-recurrent neural networks for scene labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3620--3629. Google Scholar

[23] Kuen J, Wang Z H, Wang G. Recurrent attentional networks for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3668--3677. Google Scholar

[24] Liang X D, Shen X H, Xiang D L, et al. Semantic object parsing with local-global long short-term memory. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3185--3193. Google Scholar

[25] Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1520--1528. Google Scholar

[26] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. In: Proceedings of International Conference on Learning Representations, San Juan, 2016. Google Scholar

[27] Chen L C, Papandreou G, Kokkinos I, et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,. arXiv Google Scholar

[28] Sermanet P, Fergus R, LeCun Y, et al. Overfeat: integrated recognition, localization and detection using convolutional networks. In: Proceedings of International Conference on Learning Representations, Banff, 2014. Google Scholar

[29] Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In: Proceedings of European Conference on Computer Vision, Zurich, 2014. 818--833. Google Scholar

[30] Krähenbühl P, Koltun V. Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Proceedings of Advances in Neural Information Processing Systems, Granada, 2011. 109--117. Google Scholar

[31] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, San Diego. 2015. Google Scholar

[32] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 770--778. Google Scholar

[33] Gao W, Zhou Z H. Dropout Rademacher complexity of deep neural networks. Sci China Inf Sci, 2016, 59: 072104 CrossRef Google Scholar

[34] Wu Z F, Shen C H, Hengel A. High-performance semantic segmentation using very deep fully convolutional networks,. arXiv Google Scholar

[35] Hariharan B, Arbeláez P, Girshick R, et al. Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 447--456. Google Scholar

[36] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3431--3440. Google Scholar

[37] Xie S N, Tu Z W. Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1395--1403. Google Scholar

[38] Lin G S, Milan A, Shen C H, et al. RefineNet: multi-path refinement networks with identity mappings for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017. 1925--1934. Google Scholar

[39] Wu Z F, Shen C H, Hengel A. Wider or deeper: revisiting the ResNet model for visual recognition,. arXiv Google Scholar

[40] Hong S, Oh J, Lee H, et al. Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3204--3212. Google Scholar

[41] Chen L C, Yang Y, Wang J, et al. Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3640--3649. Google Scholar

[42] Liu S, Qi X J, Shi J P, et al. Multi-scale patch aggregation (MPA) for simultaneous detection and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3141--3149. Google Scholar

[43] Bertasius G, Shi J, Torresani L. Semantic segmentation with boundary neural fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3602--3610. Google Scholar

[44] Mostajabi M, Yadollahpour P, Shakhnarovich G. Feedforward semantic segmentation with zoom-out features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3376--3385. Google Scholar

[45] Hong S, Noh H, Han B. Decoupled deep neural network for semi-supervised semantic segmentation. In: Proceedings of Advances in Neural Information Processing Systems, Montreal, 2015. 1495--1503. Google Scholar

[46] Arnab A, Jayasumana S, Zheng S, et al. Higher order conditional random fields in deep neural networks. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 524--540. Google Scholar

[47] Vemulapalli R, Tuzel O, Liu M Y, et al. Gaussian conditional random field network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3224--3233. Google Scholar

[48] Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017. 2881--2890. Google Scholar

[49] Yang J, Price B, Cohen S, et al. Object contour detection with a fully convolutional encoder-decoder network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 193--202. Google Scholar

[50] Lee C Y, Xie S, Gallagher P, et al. Deeply-supervised nets. In: Proceedings of Artificial Intelligence and Statistics, San Diego, 2015. 562--570. Google Scholar

[51] Kokkinos I. Pushing the boundaries of boundary detection using deep learning. In: Proceedings of International Conference on Learning Representations, San Juan, 2016. Google Scholar

[52] Giusti A, Ciresan D C, Masci J, et al. Fast image scanning with deep max-pooling convolutional neural networks. In: Proceedings of the 20th IEEE International Conference on Image Processing (ICIP), Melbourne, 2013. 4034--4038. Google Scholar

[53] Sutton C, Mccallum A. Piecewise training for undirected models. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence. Edinburgh: AUAI Press, 2005. 568--575. Google Scholar

[54] Adams A, Baek J, Davis M A. Fast High-Dimensional Filtering Using the Permutohedral Lattice. Comput Graphics Forum, 2010, 29: 753-762 CrossRef Google Scholar

[55] Dai J F, He K M, Sun J. Boxsup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1635--1643. Google Scholar

[56] Rother C, Kolmogorov V, Blake A. "GrabCut". ACM Trans Graph, 2004, 23: 309-314 CrossRef Google Scholar

[57] Uijlings J R R, van de Sande K E A, Gevers T. Selective Search for Object Recognition. Int J Comput Vis, 2013, 104: 154-171 CrossRef Google Scholar

[58] Arbeláez P, Pont-Tuset J, Barron J, et al. Multiscale combinatorial grouping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014. 328--335. Google Scholar

[59] Krahenbühl P, Koltun V. Geodesic object proposals. In: Proceedings of European Conference on Computer Vision, Zurich, 2014. 725--739. Google Scholar

[60] Lin D, Dai J F, Jia J Y, et al. Scribblesup: scribble-supervised convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3159--3167. Google Scholar

[61] Romera-Paredes B, Torr P H S. Recurrent instance segmentation. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 312--329. Google Scholar

[62] Dai J F, He K M, Sun J. Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3150--3158. Google Scholar

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1