logo

SCIENCE CHINA Information Sciences, Volume 60, Issue 10: 103101(2017) https://doi.org/10.1007/s11432-017-9178-8

A robust three-stage approach to large-scale urban scene recognition

More info
  • ReceivedMar 27, 2017
  • AcceptedJul 18, 2017
  • PublishedSep 6, 2017

Abstract

To obtain the ultimate high-level description of urban scenes, we propose a three-stage approach to recognizing the 3D reconstructed scene with efficient representations. First, we develop a joint semantic labeling method to obtain a semantic labeling of the triangular mesh-based representation by exploiting both image features and geometric features. The labeling is formulated over a conditional random field (CRF) that incorporates local spacial smoothness and multi-view consistency. Then, based on the labeled reconstructed meshes, we refine the man-made object segmentation in the recomposed global orthographic map with a graph partition algorithm, and propagate the coherent segmentation to the entire 3D meshes. Finally, we propose to generate a compact, abstracted geometric representation for each man-made object which is more visually appealing than the original cluttered models. This abstraction algorithm also leverages CRF formation to partition building footprints into minimal sets of structural linear features which are then used to construct profiles for large-scale scenes. The proposed recognition approach is able to robustly handle reconstructions with poor geometry and connectivity, thanks to the higher order CRF formulations which impose the ubiquitous regularity priors in urban scenes. Each stage performs an individual and uncoupling task. The intensive experiments have demonstrated the superior performance of our approach in robustness, accuracy and applicability.


Supplement


References

[1] Ochmann S, Vock R, Wessel R, et al. Automatic generation of structural building descriptions from 3d point cloud scans. In: Proceedings of International Conference on Computer Graphics Theory and Applications (GRAPP), Lisbon, 2014. 1--8. Google Scholar

[2] Yu Z D, Xu C J, Liu J Z, et al. Automatic object segmentation from large scale 3d urban point clouds through manifold embedded mode seeking. In: Proceedings of the 19th ACM International Conference on Multimedia, Scottsdale, 2011. 1297--1300. Google Scholar

[3] Matei B C, Sawhney H S, Samarasekera S, et al. Building segmentation for densely built urban regions using aerial LIDAR data. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Anchorage, 2008. 1--8. Google Scholar

[4] Drauschke M, Schuster H-F, Förstner W. Detectability of buildings in aerial images over scale space. In: Proceedings of Conference on Photogrammetric Computer Vision, Dresden, 2006. 7--12. Google Scholar

[5] Mayer H. Automatic Object Extraction from Aerial Imagery-A Survey Focusing on Buildings. Comp Vision Image Understanding, 1999, 74: 138-149 CrossRef Google Scholar

[6] Suveg I, Vosselman G. Reconstruction of 3D building models from aerial images and maps. ISPRS J Photogrammetry Remote Sens, 2004, 58: 202-224 CrossRef ADS Google Scholar

[7] Kraus K, Pfeifer N. Determination of terrain models in wooded areas with airborne laser scanner data. ISPRS J Photogrammetry Remote Sens, 1998, 53: 193-203 CrossRef ADS Google Scholar

[8] Verma V, Kumar R, Hsu S. 3d building detection and modeling from aerial LIDAR data. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) New York, 2006. 2213--2220. Google Scholar

[9] Honghui Zhang , Jinglu Wang , Tian Fang . Joint Segmentation of Images and Scanned Point Cloud in Large-Scale Street Scenes With Low-Annotation Cost. IEEE Trans Image Process, 2014, 23: 4763-4772 CrossRef PubMed ADS Google Scholar

[10] Brédif M, Boldo D, Deseilligny M P, et al. 3d building reconstruction with parametric roof superstructures. In: Proceedings of 14th IEEE International Conference on Image Processing (ICIP) San Antonio, 2007. 537--540. Google Scholar

[11] Rottensteiner F, Trinder J, Clode S, et al. Automated delineation of roof planes from LIDAR data. In: Proceedings of ISPRS Workshop Laser Scanning 2005 Enschede, 2005. 221--226. Google Scholar

[12] Guo Y L, Sawhney H S, Kumar R, et al. Learning-based building outline detection from multiple aerial images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Kauai, 2001. II-545--II-552. Google Scholar

[13] Lafarge F, Descombes X, Zerubia J. Structural approach for building reconstruction from a single DSM.. IEEE Trans Pattern Anal Mach Intell, 2010, 32: 135-147 CrossRef PubMed Google Scholar

[14] Wang J, Fang T, Su Q. Image-Based Building Regularization Using Structural Linear Features. IEEE Trans Visual Comput Graphics, 2016, 22: 1760-1772 CrossRef Google Scholar

[15] Liu J B, Wang J L, Fang T, et al. Higher-order CRF structural segmentation of 3d reconstructed surfaces. In: Proceedings of IEEE International Conference on Computer Vision (ICCV) Santiago, 2015. 2093--2101. Google Scholar

[16] Zhou Q Y, Neumann U. 2.5 d building modeling by discovering global regularities. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Providence, 2012. 326--333. Google Scholar

[17] Malik J, Belongie S, Leung T. Contour and texture analysis for image segmentation. Int J Comp Vision, 2001, 43: 7-27 CrossRef Google Scholar

[18] Kohli P, Ladicky L, Torr P H S. Robust Higher Order Potentials for Enforcing Label Consistency. Int J Comput Vis, 2009, 82: 302-324 CrossRef Google Scholar

[19] Liu J B, Wang J L, Fang T, et al. Higher-order CRF structural segmentation of 3d reconstructed surfaces. In: Proceedings of IEEE International Conference on Computer Vision (ICCV) Santiago, 2015. 2093--2101. Google Scholar

[20] Comaniciu D, Meer P. Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Machine Intell, 2002, 24: 603-619 CrossRef Google Scholar

[21] Ladicky L, Russell C, Kohli P, et al. Associative hierarchical CRFs for object class image segmentation. In: Proceedings of IEEE International Conference on Computer Vision (ICCV) Kyoto, 2009. 739--746. Google Scholar

[22] Shotton J, Johnson M, Cipolla R. Semantic texton forests for image categorization and segmentation. In: Criminisi A, Shotton J, eds. Decision Forests for Computer Vision and Medical Image Analysis. London: Springer, 2008. Google Scholar

[23] Boykov Y, Veksler O, Zabih R. Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Machine Intell, 2001, 23: 1222-1239 CrossRef Google Scholar

[24] Jianbo Shi , Malik J. Normalized cuts and image segmentation. IEEE Trans Pattern Anal Machine Intell, 2000, 22: 888-905 CrossRef Google Scholar

[25] Suzuki S, be K A. Topological structural analysis of digitized binary images by border following. Comp Vision Graphics Image Processing, 1985, 30: 32-46 CrossRef Google Scholar

[26] Ester M, Kriegel H-P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland. 1996. 226--231. Google Scholar

[27] Kohli P, Kumar M P, Torr P H S. P3 & beyond: solving energies with higher order cliques In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Minneapolis, 2007. 1--8. Google Scholar

[28] Lhuillier M, Quan L. A quasi-dense approach to surface reconstruction from uncalibrated images.. IEEE Trans Pattern Anal Machine Intell, 2005, 27: 418-433 CrossRef PubMed Google Scholar

[29] Kazhdan M, Bolitho M, Hoppe H. Poisson surface reconstruction. In: Proceedings of Eurographics Symposium on Geometry Processing Cagliari, 2006. 61--70. Google Scholar

[30] Sinha S N, Steedly D, Szeliski R, et al. Interactive 3d architectural modeling from unordered photo collections. ACM Trans Graph 2008, 27: 159. Google Scholar

[31] Anguelov D, Taskarf B, Chatalbashev V, et al. Discriminative learning of Markov random fields for segmentation of 3d scan data. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Diego, 2005. 169--176. Google Scholar

[32] Munoz D, Bagnell J A, Vandapel N, et al. Contextual classification with functional max-margin Markov networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 975--982. Google Scholar

[33] Zhang H H, Xiao J X, Quan L. Supervised label transfer for semantic segmentation of street scenes. In: Proceedings of the 11th European Conference on Computer Vision, Heraklion, 2009. 561--574. Google Scholar

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1