SCIENCE CHINA Information Sciences, Volume 60, Issue 12: 123101(2017) https://doi.org/10.1007/s11432-017-9252-5

## Semantic segmentation of high-resolution images

• AcceptedOct 13, 2017
• PublishedNov 7, 2017
Share
Rating

### Abstract

Image semantic segmentation is a research topic that has emerged recently. Although existing approaches have achieved satisfactory accuracy, they are limited to handling low-resolution images owing to their large memory consumption. In this paper, we present a semantic segmentation method for high-resolution images. First, we downsample the input image to a lower resolution and then obtain a low-resolution semantic segmentation image using state-of-the-art methods. Next, we use joint bilateral upsampling to upsample the low-resolution solution and obtain a high-resolution semantic segmentation image. To modify joint bilateral upsampling to handle discrete semantic segmentation data, we propose using voting instead of interpolation in filtering computation. Compared to state-of-the-art methods, our method significantly reduces memory cost without reducing result quality.

### Acknowledgment

This work was supported by National Natural Science Foundation of China (Grant No. 61521002), a research grant from the Beijing Higher Institution Engineering Research Center, and the Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology.

### References

[1] Carneiro G, Chan A B, Moreno P J, et al. Supervised learning of semantic classes for image annotation and retrieval.IEEE Trans Pattern Anal Mach Intell,2007, 29: 394--410. Google Scholar

[2] Gould S, Fulton R, Koller D. Decomposing a scene into geometric and semantically consistent regions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Kyoto, 2009. 1--8. Google Scholar

[3] Ren X, Bo L, Fox D. RGB-(D) scene labeling: features and algorithms. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2012. 2759--2766. Google Scholar

[4] Farabet C, Couprie C, Najman L. Learning hierarchical features for scene labeling.. IEEE Trans Pattern Anal Mach Intell, 2013, 35: 1915-1929 CrossRef PubMed Google Scholar

[5] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 2015. 3431--3440. Google Scholar

[6] Kopf J, Cohen M F, Lischinski D, et al. Joint bilateral upsampling. ACM Trans Graph, 2007, 26: 96. Google Scholar

[7] Tomasi C, Manduchi R. Bilateral filtering for gray and color images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Bombay, 1998. 839--846. Google Scholar

[8] Zhou B, Zhao H, Puig X, et al. Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017. Google Scholar

[9] Li X, Liu K, Dong Y. Superpixel-based foreground extraction with fast adaptive trimaps. IEEE Trans Cybern, 2017, doi: 10.1109/TCYB.2017.2747143. Google Scholar

[10] Huang H, Fang X, Ye Y, et al. Practical automatic background substitution for live video. Comp Visual Media, 2017, 3: 273–284. Google Scholar

[11] Li X, Liu K, Dong Y, et al. Patch alignment manifold matting. IEEE Trans Neural Netw Learn Syst, 2017, doi: 10.1109/TNNLS.2017.2727140. Google Scholar

[12] Zheng Z H, Zhang H T, Zhang F L, et al. Image-based clothes changing system. Comput Vis Media, 2017, in press. Google Scholar

[13] Maerki N, Perazzi F, Wang O, et al. Bilateral space video segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 743--751. Google Scholar

• Figure 1

(Color online) Examples of an indoor video: (a) two frames of the input video; (b) high-resolution semantic segmentation results by [5,13]; (c) low-resolution semantic segmentation results by [5,13]; (d) our high-resolution semantic segmentation results.

• Figure 2

(Color online) Examples of street view panoramas: (a) input images; (b) low-resolution semantic segmentation results by [5]; (c) our high resolution semantic segmentation results.

• Table 1   Statistics
 Example Our method CNN method, high-resolution CNN method, low-resolution Resolution Memory (G) Resolution Memory (G) Resolution Memory (G) Indoor, Figure 1 $1600~\times~900$ $1.6$ $1600~\times~900$ $4.9$ $640\times~480$ $1.6$ Panorama, Figure 2 $8192~\times~4096$ $1.9$ N/A N/A $800\times~400$ $1.9$
• #### 3

Citations

• Altmetric

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有