logo

SCIENTIA SINICA Informationis, Volume 49, Issue 4: 436-449(2019) https://doi.org/10.1360/N112018-00254

3D shape classification based on convolutional neural networks fusing multi-view information

More info
  • ReceivedSep 14, 2018
  • AcceptedJan 28, 2019
  • PublishedApr 11, 2019

Abstract

In recent years, convolutional neural network (CNN) architecture has achieved good results in the fields of 2D image recognition, detection, and semantic segmentation. However, given the complexity and irregularity of 3D shape structures, CNNs cannot be directly applied to 3D data. With the advantage of the deep learning framework in the field of 2D image analysis, the view-based method can be used for 3D shape classification. However, the existing multi-view based 3D shape classification methods mostly adopt fixed viewpoints. Considerable information redundancy exist in the rendered images, and it can cause certain interference to the results. Herein, we propose a novel multi-view CNN framework, which automatically discriminates the contribution of each viewpoint during the network training and discards the redundant information. In addition, the optimal viewpoint selection method based on viewpoint entropy is introduced into the field of 3D shape classification. In comparison with the fixed viewpoint method, this procedure can retain more detailed information of the shapes and requires no orientation alignment of the model. Experiments on the ModelNet10 and ModelNet40 datasets verify the rationality and superiority of applying the optimal viewpoint selection method based on the viewpoint entropy to 3D model classification and the multi-view information fusion method proposed herein. The experimental results show the better classification accuracy of this method than that of existing 3D model classification methods.


Funded by

国家自然科学基金(61321491)

国家自然科学基金(61100110)

国家自然科学基金(61272219)

江苏省科技支撑计划(BY2012190)

江苏省科技支撑计划(BY2013072-04)


References

[1] Bronstein A M, Bronstein M M, Guibas L J. Shape Google: geometric words and expressions for invariant shape retrieval. ACM Trans Graph, 2011, 30: 1-20 CrossRef Google Scholar

[2] Funkhouser T, Kazhdan M, Min P. Shape-based retrieval and analysis of 3d models. Commun ACM, 2005, 48: 58 CrossRef Google Scholar

[3] Huang Q X, Su H, Guibas L. Fine-grained semi-supervised labeling of large shape collections. ACM Trans Graph, 2013, 32: 1-10 CrossRef Google Scholar

[4] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Massachusetts, 2015. 3431--3440. Google Scholar

[5] Chen L C, Papandreou G, Kokkinos I. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 834-848 CrossRef PubMed Google Scholar

[6] Su H, Maji S, Kalogerakis E, et al. Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of IEEE International Conference on Computer Vision, Santiago, 2015. 945--953. Google Scholar

[7] Qi C R, Su H, Niebner M, et al. Volumetric and mul-ti-view CNNs for object classification on 3D data. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 5648--5656. Google Scholar

[8] Wang C, Pelillo M, Siddiqi K. Dominant set clustering and pooling for multi-view 3D object recogni-tion. In: Proceedings of British Machine Vision Conference, 2017. Google Scholar

[9] Vazquez P P, Feixas M, Sbert M, et al. Viewpoint selection using viewpoint entropy. In: Proceedings of Vision Modeling and Visualization Conference, 2001. 273--280. Google Scholar

[10] Sbert M, Plemenos D, Feixas M. Viewpoint quality: measures and applications. In: Proceedings of the 1st Computational Aesthetics in Graphics, Visualization and Imaging. 2005. 185--192. Google Scholar

[11] Lee C H, Varshney A, Jacobs D W. Mesh saliency. ACM Trans Graph, 2005, 24: 659 CrossRef Google Scholar

[12] Qin F, Li L, Gao S. A deep learning approach to the classification of 3D CAD models. J Zhejiang Univ - Sci C, 2014, 15: 91-106 CrossRef Google Scholar

[13] Socher R, Huval B, Bhat B, et al. Convolutional-recursive deep learning for 3D object classification. In: Proceedings of the 25th Advances in Neural Information Processing Systems, Lake Tahoe, 2012. 665--673. Google Scholar

[14] Bruna J, Zaremba W, Szlam A, et al. Spectral networks and locally connected networks on graphs. 2013,. arXiv Google Scholar

[15] Masci J, Boscaini D, Bronstein M M, et al. Geodesic convolutional neural networks on riemannian manifolds. In: Proceedings of IEEE International Conference on Computer Vision, Santiago, 2015. 832--840. Google Scholar

[16] Wu Z R, Song S R, Khosla A, et al. 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. 1912--1920. Google Scholar

[17] Li Y Y, Pirk S, Su H, et al. FPNN: field probing neural networks for 3D data. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. Google Scholar

[18] Xu X, Todorovic S. Beam search for learning a deep convolutional neural network of 3D shapes. In: Proceedings of International Conference on Pattern Recognition, Columbia, 2017. 3506--3511. Google Scholar

[19] Sedaghat N, Zolfaghari M, Brox T. Orientation-boosted voxel nets for 3D object recognition. In: Proceedings of British Machine Vision Conference, London, 2017. Google Scholar

[20] Ren M W, Niu L, Fang Y. 3D-a-nets: 3D deep dense descriptor for volumetric shapes with adversarial networks. 2017,. arXiv Google Scholar

[21] Qi C R, Su H, Mo K, et al. PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, 2017. Google Scholar

[22] Qi C R, Yi L, Su H, et al. PointNet+: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the 31st Advances in Neural Information Processing Systems, 2017. Google Scholar

[23] Simonovsky M, Komodakis N. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, 2017. Google Scholar

[24] Klokov R, Lempitsky V. Escape from cells: deep Kd-networks for the recognition of 3D point cloud mod-els. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 863--872. Google Scholar

[25] Li J X, Chen B M, Lee G H. SO-net: self-organizing network for point cloud analysis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. Google Scholar

[26] Shi B, Bai S, Zhou Z. DeepPano: Deep Panoramic Representation for 3-D Shape Recognition. IEEE Signal Process Lett, 2015, 22: 2339-2343 CrossRef ADS Google Scholar

[27] Johns E, Leutenegger S, Davison A J. Pairwise decomposition of image sequences for active multi-view recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3813--3822. Google Scholar

[28] Sfikas K, Pratikakis I, Theoharis T. Ensemble of PANORAMA-based convolutional neural networks for 3D model classification and retrieval. Comput Graphics, 2018, 71: 208-218 CrossRef Google Scholar

[29] Kanezaki A, Matsushita Y, Nishida Y. RotationNet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. Google Scholar

[30] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th Advances in Neural Information Processing Systems, Lake Tahoe, 2012. Google Scholar

[31] van der Maaten L, Hinton G. Visualizing high-dimensional data using t-SNE. J Mach Learn Res, 2008, 9: 2579---2605. Google Scholar

  • Figure 1

    (Color online) Viewpoint selection based on viewpoint entropy. (a) illustrates viewpoint selection process;protect łinebreak (b) is a projection image at a viewpoint

  • Figure 2

    (Color online) Comparison of viewpoints selected based on viewpoint entropy method and the fixed viewpoints. (a) Fixed viewpoints; (b) viewpoints selected based on viewpoint entropy; (c) projection images at fixed viewpoints;protectłinebreak (d) projection images at viewpoints selected based on viewpoint entropy

  • Figure 3

    (Color online) Multi-view information fusion network structure

  • Figure 4

    (Color online) Relationship between viewpoint entropy and visible faces coverage under different number of viewpoints

  • Figure 5

    (Color online) ModelNet40 partial classification results visualization. (a) Before classification; (b) after classification

  • Figure 6

    (Color online) ModelNet40 partial clustering features visualization

  • Table 1   Comparison of the influence of perspective selection on classification accuracy
    Method #Views Accuracy (ModelNet40) (%)
    MVCNN [6] 12 89.9
    80 90.1
    MVCNN-MultiRes [7] 20 91.4
    MVCNN (viewpoint entropy) 7 89.7
    9 90.3
    12 91.6
    20 91.7
  • Table 2   Comparison of the influence of perspective selection on classification accuracy
    Method #Views Accuracy (ModelNet10) (%) Accuracy (ModelNet40) (%)
    Ours (fixed viewpoints) 12 93.8 90.9
    Ours (viewpoint entropy) 12 95.1 92.2
    MVCNN [6] 12 89.9
    80 90.1
    PANORAMA-NN [28] 91.1 90.7
    Pairwise [27] 92.8 90.7
    MVCNN-MultiRes [7] 20 91.4
    KD-Networks [24] 94.0 91.8
    PointNet [21] 86.2
    3D-GAN [20] 91.0 83.3
    3DShapeNets [16] 83.5 77.0
  • Table 3   Top1$\sim$5 error rate by using this paper's method
    Measure method Error rate (ModelNet10) (%) Error rate (ModelNet40) (%)
    Top1 4.84 7.82
    Top2 3.87 6.36
    Top3 3.26 5.19
    Top4 2.69 4.23
    Top5 2.18 3.07

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1