logo

SCIENTIA SINICA Informationis, Volume 48, Issue 5: 501-510(2018) https://doi.org/10.1360/N112017-00216

Resource-constrained deep learning: challenges and practices

More info
  • ReceivedFeb 5, 2018
  • AcceptedMar 12, 2018
  • PublishedMay 11, 2018

Abstract

Deep learning has made significant progress in recent years. However, deep learning models require many computation-related resources, and their learning process requires a large number of data points and their labels. Hence, the reduction of resource consumption of deep learning, i.e., resource-constrained deep learning, is a current research focus. In this study, we first analyze deep learning's thirst for various types of resources and the challenges they lead to and thereafter briefly introduce research progress from three aspects: data, label, and computation resources. Further, we provide detailed introductions of these areas using our research results as examples.


Funded by

国家自然科学基金(61772256)

国家自然科学基金(61422203)


References

[1] Krizhenvsky A, Sutskever I, Hinton G. ImageNet classificaiton with deep convolutional neural networks. In: Proceedings of the 25th Advances in Neural Information Processing Systems, Lake Tahoe, 2012. 1097--1105. Google Scholar

[2] Lecun Y, Bottou L, Bengio Y. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86: 2278-2324 CrossRef Google Scholar

[3] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 770--778. Google Scholar

[4] Sun C, Shrivastava A, Singh S, et al. Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the International Conference on Computer Vision, Venice, 2017. 843--852. Google Scholar

[5] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition,. arXiv Google Scholar

[6] Kokkinos I. UberNet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Hawaii, 2017. 6129--6138. Google Scholar

[7] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of the 27th Advances in Neural Information Processing Systems, Montreal, 2014. 2672--2680. Google Scholar

[8] Shrivastava A, Pfister T, Tuzel O, et al. Learning from simulated and unsupervised images through adversarial training. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Hawaii, 2017. 2107--2116. Google Scholar

[9] Russakovsky O, Deng J, Su H. ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis, 2015, 115: 211-252 CrossRef Google Scholar

[10] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the 28th Advances in Neural Information Processing Systems, Montreal, 2015. 91--99. Google Scholar

[11] Shelhamer E, Long J, Darrell T. Fully Convolutional Networks for Semantic Segmentation.. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 640-651 CrossRef PubMed Google Scholar

[12] Wei X S, Luo J H, Wu J. Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval. IEEE Trans Image Process, 2017, 26: 2868-2881 CrossRef PubMed ADS arXiv Google Scholar

[13] Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In: Proceedings of the European Conference on Computer Vision, Zurich, 2014. 8689: 818--833. Google Scholar

[14] Krause J, Sapp B, Howard A, et al. The unreasonable effectivenss of noisy data for fine-grained recognition. In: Proceedings of the European Conference on Computer Vision, Amsterdam, 2016. 9907: 301--320. Google Scholar

[15] Zhu X, Goldberg A B. Introduction to Semi-Supervised Learning. San Rafael: Morgan & Claypool Publishers LLC, 2009. Google Scholar

[16] Rasmus A, Valpola H, Honkala M, et al. Semi-supervised learning with ladder networks. In: Proceedings of the Advances in Neural Information Processing Systems 28, Montreal, 2015. 3546--3554. Google Scholar

[17] Laine S, Aila T. Temporal ensembling for semi-supervised leanring. In: Proceedings of the International Conference on Learning Representations, Toulon, 2017. Google Scholar

[18] Wei X-S, Zhang C-L, Li Y, et al. Deep descriptor transforming for image co-localization. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, 2017. 3048--3054. Google Scholar

[19] Gao B B, Xing C, Xie C W. Deep Label Distribution Learning With Label Ambiguity. IEEE Trans Image Process, 2017, 26: 2825-2838 CrossRef PubMed ADS arXiv Google Scholar

[20] Geng X. Label Distribution Learning. IEEE Trans Knowl Data Eng, 2016, 28: 1734-1748 CrossRef Google Scholar

[21] Rastegari M, Ordonez V, Redmon J, et al. XNOR-Net: imageNet classification using binary convolutional neural networks. In: Proceedings of the European Conference on Computer Vision, Amsterdam, 2016, 9908: 525--542. Google Scholar

[22] Wu J X, Leng C, Wang Y H, et al. Quantized convolutional neural networks for mobile devices. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 4820--4828. Google Scholar

[23] Han S, Pool J, Tran J, et al. Learning both weights and connections for efficient neural network. In: Proceedigns of the Advances in Neural Information Processing Systems 28, Montreal, 2015. 1135--1143. Google Scholar

[24] Xie S, Girshick R, Dollar P, et al. Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Hawaii, 2017. 1492--1500. Google Scholar

[25] Luo J-H, Wu J X, Lin W. ThiNet: a filter level pruning method for deep neural network compression. In: Proceedings of the International Conference on Computer Vision, Venice, 2017. 5058--5066. Google Scholar

[26] He Y H, Zhang X Y, Sun J. Channel pruning for accelerating very deep neural networks. In: Proceedings of the International Conference on Computer Vision, Venice, 2017. 1389--1397. Google Scholar

[27] Liu Z, Li J G, Shen Z Q, et al. Learning efficient convolutional networks through network slimming. In: Proceedings of the International Conference on Computer Vision, Venice, 2017. 2736--2744. Google Scholar

[28] Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient ConvNets. In: Proceedings of the International Conference on Learning Representations, Toulon, 2017. Google Scholar

  • Figure 1

    (Color online) The SCDA method for fine-grained image retrieval [12]. (a) illustrates fine-grained images, with the left and right images being golden retriever and Labrador retriever, respectively; (b) are samples of retrieval results, in which red boxes denote wrong results; (c) shows examples of localization results; and (d) illustrates the new dimensions after SVD

  • Figure 2

    (Color online) DDT, a weakly supervised method [18]. (a) is an illustration of the DDT task, with the two rows being input images and their respective desired output, and the red cross in the third picture illustrates a noisy input image. (b) contains examples of DDT localization results, with the red solid and yellow dashed boxes being true and DDT predicted localization results, respectively. Our prediction is exactly the same as the groundtruth localization when the yellow box is invisible

  • Figure 3

    (Color online) DLDL, a method that utilizes label uncertainty [18]. (a) is a sample face image and the groundtruth label distribution generated by labels from multiple annotators; (b) is a test image and the prediction by DLDL

  • Figure 4

    (Color online) Ideas behind the ThiNet method [25]: using the next layer's output to guide the compression of the current layer, and then fine-tune the network

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1