logo

SCIENCE CHINA Information Sciences, Volume 63 , Issue 2 : 120104(2020) https://doi.org/10.1007/s11432-019-2718-7

CGNet: cross-guidance network for semantic segmentation

More info
  • ReceivedJun 16, 2019
  • AcceptedNov 29, 2019
  • PublishedJan 16, 2020

Abstract

Semantic segmentation is a fundamental task in image analysis. The issue of semantic segmentation is to extract discriminative features for distinguishing different objects and recognizing hard examples. However, most existing methods have limitations on resolving this problem. To tackle this problem, we identify the contributions of the edge and saliency information for segmentation and present a novel end-to-end network, termed cross-guidance network (CGNet) to leverage them to benefit the semantic segmentation.The edge and saliency detection network are unified into the CGNet, and model the intrinsic information among them, guiding the process of extracting discriminative features. Specifically, the CGNet attempts to extract segmentation, edge, and salient features, simultaneously. Then it transfers them into the cross-guidance module (CGM) to generate the pre-knowledge features based on the modeled information, optimizing the context feature extraction process. The proposed approach is extensively evaluated on PASCAL VOC 2012, PASCAL-Person-Part, and Cityscapes, and achieves state-of-the-art performance, demonstrating the superiority of the proposed approach.


References

[1] Geng Q C, Zhou Z, Cao X C. Survey of recent progress in semantic image segmentation with CNNs. Sci China Inf Sci, 2018, 61: 051101 CrossRef Google Scholar

[2] Shelhamer E, Long J, Darrell T. Fully Convolutional Networks for Semantic Segmentation.. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 640-651 CrossRef PubMed Google Scholar

[3] He K, Zhang X, Ren S. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 1904-1916 CrossRef PubMed Google Scholar

[4] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 6230--6239. Google Scholar

[5] Chen L C, Papandreou G, Kokkinos I. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 834-848 CrossRef PubMed Google Scholar

[6] Chen L-C, Papandreou G, Schroff F, et al. Rethinking Atrous Convolution for Semantic Image Segmentation. 2017,. arXiv Google Scholar

[7] Chen L-C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 833--851. Google Scholar

[8] Joachims T, Finley T, Yu C N J. Cutting-plane training of structural SVMs. Mach Learn, 2009, 77: 27-59 CrossRef Google Scholar

[9] Lin T-Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 2999--3007. Google Scholar

[10] Wu Z, Shen C, Hengel A. High-performance semantic segmentation using very deep fully convolutional networks. 2016,. arXiv Google Scholar

[11] Kokkinos I. UberNet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5454--5463. Google Scholar

[12] Sun H Q, Pang Y W. GlanceNets - efficient convolutional neural networks with adaptive hard example mining. Sci China Inf Sci, 2018, 61: 109101 CrossRef Google Scholar

[13] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, San Diego, 2015. Google Scholar

[14] He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 770--778. Google Scholar

[15] Huang G, Liu Z, Maaten L, et al. Densely Connected Convolutional Networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2261--2269. Google Scholar

[16] Chollet F. Xception: Deep Learning with Depthwise Separable Convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1800--1807. Google Scholar

[17] Badrinarayanan V, Kendall A, Cipolla R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2481-2495 CrossRef PubMed Google Scholar

[18] Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Santiago, 2015. 1520--1528. Google Scholar

[19] Yu F, Koltun V, Funkhouser T A. Dilated residual networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 636--644. Google Scholar

[20] Lin G, Milan A, Shen C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5168--5177. Google Scholar

[21] Zhang H, Dana K, Shi J, et al. Context encoding for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7151--7160. Google Scholar

[22] Huang Z, Wang X, Huang L, et al. CCNet: criss-cross attention for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Seoul, 2019. Google Scholar

[23] Jégou S, Drozdzal M, Vázquez D, et al. The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, 2017. 1175--1183. Google Scholar

[24] Yang M, Yu K, Zhang C, et al. DenseASPP for Semantic Segmentation in Street Scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 3684--3692. Google Scholar

[25] Zhang Z, Zhang X, Peng C, et al. ExFuse: enhancing feature fusion for semantic segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 273--288. Google Scholar

[26] Zhao H, Qi X, Shen X, et al. ICNet for real-time semantic segmentation on high-resolution images. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 418--434. Google Scholar

[27] Li H, Xiong P, An J, et al. Pyramid attention network for semantic segmentation. In: Proceedings of British Machine Vision Conference, Newcastle, 2018. 285. Google Scholar

[28] Peng C, Zhang X, Yu G, et al. Large kernel matters---improve semantic segmentation by global convolutional network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1743--1751. Google Scholar

[29] Wei Z, Sun Y, Wang J. Learning adaptive receptive fields for deep image parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 3947--3955. Google Scholar

[30] Pang Y, Wang T, Anwer R M, et al. Efficient featurized image pyramid network for single shot detector. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 7336--7344. Google Scholar

[31] Deng R, Shen C, Liu S, et al. Learning to predict crisp boundaries. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 570--586. Google Scholar

[32] Xie S, Tu Z. Holistically-nested edge detection. International J Comput Vis, 2017, 125: 3--18. Google Scholar

[33] Liu Y, Cheng M-M, Hu X, et al. Richer convolutional features for edge detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5872--5881. Google Scholar

[34] Liu Y, Lew M S. Learning relaxed deep supervision for better edge detection. In: Proceedings of IEEE Conference on Computer Vision, Las Vegas, 2016. 231--240. Google Scholar

[35] Shen W, Wang X, Wang Y, et al. DeepContour: a deep convolutional feature learned by positive-sharing loss for contour detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3982--3991. Google Scholar

[36] Wang T-C, Liu M-Y, Zhu J-Y, et al. High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 8798--8807. Google Scholar

[37] Wang W, Lai Q, Fu H, et al. Salient object detection in the deep learning era: an in-depth survey. 2019,. arXiv Google Scholar

[38] Liu N, Han J. DHSNet: deep hierarchical saliency network for salient object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 678--686. Google Scholar

[39] Wang W, Shen J, Dong X, et al. Salient object detection driven by fixation prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 1711--1720. Google Scholar

[40] Wang W, Shen J, Yang R. Saliency-Aware Video Object Segmentation.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 20-33 CrossRef PubMed Google Scholar

[41] Wang W, Shen J, Dong X. Inferring Salient Objects from Human Fixations.. IEEE Trans Pattern Anal Mach Intell, 2019, : 1-1 CrossRef PubMed Google Scholar

[42] Liu N, Han J, M.-Yang H. PiCANet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 3089--3098. Google Scholar

[43] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7132--7141. Google Scholar

[44] Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 3146--3154. Google Scholar

[45] Wang X, Girshick R, Gupta A, et al. Non-local neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7794--7803. Google Scholar

[46] Zhang X, Wang T, Qi J, et al. Progressive attention guided recurrent network for salient object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 714--722. Google Scholar

[47] Zhang X, Xiong H, Zhou W, et al. Picking deep filter responses for fine-grained image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 1134--1142. Google Scholar

[48] Everingham M, Van Gool L, Williams C K I. The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vis, 2010, 88: 303-338 CrossRef Google Scholar

[49] Xia F, Wang P, Chen X, et al. Joint multi-person pose estimation and semantic part segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 6080--6089. Google Scholar

[50] Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3213--3223. Google Scholar

[51] Hariharan B, Arbelaez P, Bourdev L D, et al. Semantic contours from inverse detectors. In: Proceedings of the IEEE International Conference on Computer Vision, Barcelona, 2017. 991--998. Google Scholar

[52] Zheng S, Jayasumana S, Romera-Paredes B. Conditional random fields as recurrent neural networks. In: Proceedings of International Conference on Computer Vision, Santiago, 2015. 1529--1537. Google Scholar

[53] Liu Z, Li X, Luo P, et al. Semantic image segmentation via deep parsing network. In: Proceedings of International Conference on Computer Vision, Santiago, 2015. 1377--1385. Google Scholar

[54] Lin G, Shen C, Hengel A, et al. Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3194--3203. Google Scholar

[55] Ke T-W, Hwang J-J, Liu Z, et al. Adaptive AFFINITY FIELDS FOR SEMANTIC SEGMENTation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 605--621. Google Scholar

[56] Wu Z, Shen C, van den Hengel A. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. Pattern Recognition, 2019, 90: 119-133 CrossRef Google Scholar

[57] Xia F, Wang P, Chen L-C, et al. Zoom better to see clearer: human and object parsing with hierarchical auto-zoom net. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 648--663. Google Scholar

[58] Chen L-C, Yang Y, Wang J, et al. Attention to scale: scale-aware semantic image segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3640--3649. Google Scholar

[59] Liang X, Shen X, Xiang D, et al. Semantic object parsing with local-global long short-term memory. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3185--3193. Google Scholar

[60] Gong K, Liang X, Zhang D, et al. Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 6757--6765. Google Scholar

[61] Luo Y, Zheng Z, Zheng L, et al. Macro-micro adversarial network for human parsing. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 424--440. Google Scholar

[62] Liang X, Shen X, Feng J, et al. Semantic object parsing with graph LSTM In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 125--143. Google Scholar

[63] Zhao J, Li J, Nie X, et al. Self-supervised neural aggregation networks for human parsing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, 2017. 1595--1603. Google Scholar

[64] Liang X, Lin L, Shen X, et al. Interpretable structure-evolving LSTM In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2175--2184. Google Scholar

[65] Nie X, Feng J, Yan S. Mutual learning to adapt for joint human parsing and pose estimation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 519--534. Google Scholar

[66] Zhu B, Chen Y, Tang M, et al. Progressive cognitive human parsing. In: Proceedings of AAAI Conference on Artificial Intelligence, New Orleans, 2018. 7607--7614. Google Scholar

[67] Li Q Z, Arnab A, Torr P H S. Holistic, instance-level human parsing. In: Proceedings of British Machine Vision Conference, London, 2017. Google Scholar

[68] Fang H, Lu G, Fang X, et al. Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 70--78. Google Scholar

[69] Gong K, Liang X, Li Y, et al. Instance-level human parsing via part grouping network. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 805--822. Google Scholar

[70] Liang X, Zhou H, Xing E. Dynamic-structure semantic propagation network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 752--761. Google Scholar

[71] Wang P, Chen P, Yuan Y, et al. Understanding convolution for semantic segmentation. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, 2018. 1451--1460. Google Scholar

[72] Zhang R, Tang S, Zhang Y, et al. Scale-adaptive convolutions for scene parsing. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 2050--2058. Google Scholar

[73] Yu C, Wang J, Peng C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 334--349. Google Scholar

[74] Yu C, Wang J, Peng C, et al. Learning a discriminative feature network for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 1857--1866. Google Scholar

[75] Zhao H, Zhang Y, Liu S, et al. PSANet: point-wise spatial attention network for scene parsing. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 270--286. Google Scholar

[76] Zhu Z, Xu M, Bai S, et al. Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Seoul, 2019. Google Scholar

  • Figure 1

    (Color online) Examples of segmentation results, using different settings on whether to utilize edge (a) or salient object (b) information.

  • Figure 2

    (Color online) Illustration of the proposed CGNet, which includes the main backbone network with a pyramid attentive module, a cross-guidance module (CGM), an edge detection head and a saliency detection head. `ResBlock' denotes the residual convolutional block in ResNet [14], while `$1\times1$', `$3\times3$',`$d$', `Up', and `Down' denote the convolutional layer with kernel size 1, convolutional layer with kernel size 3, dilated (atrous) rates of convolutional kernel, upsampling using non-parameterized bilinear interpolation, and downsampling, respectively. `CAM' and `SAM' refer to channel attentive module and spatial attentive module, respectively.

  • Figure 3

    (Color online) Illustration of the proposed modules. `$1\times1$', `$3\times3$', `D-$3\times3$' and `DW-$1\times1$' denote the convolutional layer with kernel size 1, convolutional layer with kernel size 3, dilated convolutional layer [19]with kernel size 3, and depth-wise convolutional layer [16]with kernel size 1, respectively. (a) Channel attentive module; (b) spatial attentive module; (c) cross-guidance module.

  • Table 1   Segmentation results on the PASCAL VOC 2012 validation set$^{\rm~a)}$
    Method OS (training) OS (evaluating) pixAcc (%) mIoU (%)
    DeepLab-v2 [5] 16 16 94.21 75.60
    PSPNet [4] 16 16 94.62 76.82
    PAN [27] 16 16 95.03 78.37
    DeepLab-v3 [6] 16 16 77.21
    DeepLab-v3$^{\rm~b)}$ [6] 16 8 79.77
    DeepLab-v3+ [7] 16 16 78.85
    DeepLab-v3+$^{\rm~b)}$ [7] 16 16 80.22
    DeepLab-v3+$^{\rm~b)}$ [7] 16 8 80.57
    CGNet (ours) 16 16 95.32 79.89
    CGNet$^{\rm~b)}$ (ours) 16 16 95.67 81.04

    a

  • Table 2   Segmentation results on the PASCAL VOC 2012 test set w/o COCO pre-training$^{\rm~a)}$
    Method aero (%)bike (%)bird (%)boat (%)bottle (%)bus (%)car (%)cat (%)chair (%)cow (%)
    FCN [2] 76.8 34.2 68.9 49.4 60.3 75.3 74.7 77.6 21.4 62.5
    DeepLab-v2 [5] 84.4 54.5 81.5 63.6 65.9 85.1 79.1 83.4 30.7 74.1
    CRF-RNN [52] 87.5 39.0 79.7 64.2 68.3 87.6 80.8 84.4 30.4 78.2
    DeconvNet [18] 89.9 39.3 79.7 63.9 68.2 87.4 81.2 86.1 28.5 77.0
    DPN [53] 87.7 59.4 78.4 64.9 70.3 89.3 83.5 86.1 31.7 79.9
    Piecewise [54] 90.6 37.6 80.0 67.8 74.4 92.0 85.2 86.2 39.1 81.2
    AAF [55] 91.3 _72.9 90.7 68.2 77.7 95.6 90.7 94.7 _40.9 89.5
    ResNet38 [56] 94.4 _72.9 _94.9 68.8 78.4 90.6 90.0 92.1 40.1 90.4
    PSPNet [4] 91.8 71.9 94.7 71.2 75.8 95.2 89.9 95.939.3 90.7
    EncNet [21] 94.1 69.2 96.3 76.7 86.2_96.3 90.7 94.2 38.8 90.7
    PAN [27] 95.7 75.294.0 _73.8 79.6 96.5 93.794.1 40.5 93.3
    CGNet (ours) _95.3 72.6 94.6 71.8 _82.0 95.7 _91.9 _95.8 41.8_91.5
    Method table (%)dog(%)horse (%)mbike (%)person (%)plant (%)sheep (%)sofa (%)train (%)tv (%)mIoU (%)
    FCN [2] 46.871.8 63.9 76.5 73.9 45.2 72.4 37.4 70.9 55.1 62.2
    DeepLab-v2 [5] 59.8 79.0 76.1 83.2 80.8 59.7 82.2 50.4 73.1 63.7 71.6
    CRF-RNN [52] 60.4 80.5 77.8 83.1 80.6 59.5 82.8 47.8 78.3 67.1 72.0
    DeconvNet [18] 62.0 79.0 80.3 83.6 80.2 58.8 83.4 54.3 80.7 65.0 72.5
    DPN [53] 62.681.9 80.0 83.5 82.3 60.5 83.2 53.4 77.9 65.0 74.1
    Piecewise [54] 58.9 83.8 83.9 84.3 84.8 62.1 83.2 58.2 80.8 72.3 75.3
    AAF [55] 72.6 91.6_94.1 88.3 88.8 67.3 92.9 62.6 85.2 74.0 82.2
    ResNet38 [56] 71.7 89.9 93.7 _91.0 89.1 71.3 90.7 61.3 _87.7 78.1 82.5
    PSPNet [4] 71.7 90.5 94.588.8 89.6_72.8 89.6 _64.0 85.1 76.3 82.6
    EncNet [21] _73.3 90.0 92.5 88.8 87.9 68.7 92.6 59.0 86.4 73.4 82.9
    PAN [27] 72.489.1 _94.1 91.6_89.5 73.6_93.2 62.8 87.3 _78.6 _84.0
    CGNet (ours) 74.4_91.0 92.1 90.3 89.3 71.5 94.1 67.2 88.6 81.4 84.2

    a

  • Table 3   Segmentation results on the PASCAL-Person-Part test set$^{\rm~a)}$
    Method Head (%)Torso (%)U-Arm (%)L-Arm (%)U-Leg (%)L-Leg (%)B.G. (%)mIoU (%)
    HAZN [57] 80.79 59.11 43.05 42.76 38.99 34.46 93.59 56.11
    Attention [58] 81.47 59.06 44.15 42.50 38.28 35.62 93.65 56.39
    LG-LSTM [59] 82.72 60.99 45.40 47.76 42.33 37.96 88.63 57.97
    Attention+SSL [60] 83.26 62.40 47.80 45.58 42.32 39.48 94.68 59.36
    Attention+MMAN [61] 82.58 62.83 48.49 47.37 42.80 40.40 94.92 59.91
    Graph LSTM [62] 82.69 62.68 46.88 47.71 45.66 40.93 94.59 60.16
    SS-NAN [63] 86.43 67.28 51.09 48.07 44.82 42.15 _97.23 62.44
    Structure LSTM [64] 82.89 67.15 51.42 48.72 51.72 45.91 97.18 63.57
    Joint [49] 85.50 67.87 54.72 54.30 48.25 44.76 95.32 64.39
    DeepLab-v2 [5] 64.94
    MuLA [65] 65.10
    PCNet [66] 86.81 69.06 55.35 55.27 50.21 48.54 96.07 65.90
    Holistic [67] 66.30
    WSHP [68] 87.1572.28_57.0756.2152.43_50.36 97.7267.60
    DeepLab-v3+ [7] 67.84
    PGN [69] 90.89 75.1255.83 64.61 55.4241.57 95.33 _68.40
    CGNet (ours) _87.69 _72.32 63.02_63.62 _55.34 52.9995.98 70.14

    a

  • Table 4   Segmentation results on the Cityscapes test set$^{\rm~a)}$
    Method IoU cla. (%)iIoU cla. (%)IoU cat. (%)iIoU cat. (%)
    FCN [2] 65.3 41.7 85.7 70.1
    DeepLab-v2 [5] 70.4 42.6 86.4 67.7
    RefineNet [20] 73.6
    DSSPN [70] 76.6 56.2 89.6 77.8
    GCN [28] 76.9
    DUC [71] 77.6 53.6 90.1 75.2
    SAC [72] 78.1 55.2 90.6 78.3
    PSPNet [4] 78.4 56.7 90.6 78.6
    BiSeNet [73] 78.9
    AAF [55] 79.1 56.1 90.8 78.5
    DFN [74] 79.3
    PSANet [75] 80.1
    ANN [76] 81.3
    DANet [44] 81.5
    CGNet (ours) 81.3 62.5 91.4 79.7

    a

  • Table 5   Ablation study on the PASCAL-Person-Part test set$^{\rm~a)}$
    Method pixAcc (%) mIoU (%)
    DeepLab-v2 [5] 93.55 64.94
    DeepLab-v3+ [7] 94.23 67.84
    Base 93.02 62.62
    Base + Pyramid Attention 94.02 66.95
    Base + Pyramid Attention + Edge 94.21 67.78
    Base + Pyramid Attention + Salient 94.17 67.63
    Base + Pyramid Attention + Edge + Salient 94.33 68.17
    textcolorblack
    Base + Pyramid Attention + Concat (edge & salient) 94.44 68.46
    Base + Pyramid + CGM 94.78 70.14

    a

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有

京ICP备14028887号-23       京公网安备11010102003388号