logo

SCIENTIA SINICA Informationis, Volume 49, Issue 3: 314-333(2019) https://doi.org/10.1360/N112018-00282

Research on low-power neural network computing accelerator

More info
  • ReceivedOct 18, 2018
  • AcceptedFeb 21, 2019
  • PublishedMar 20, 2019

Abstract

Artificial intelligence has aroused a global upsurge, which covers image recognition, video retrieval, speech recognition, automatic driving, and several other significant applications.As for artificial intelligence algorithms, neural network algorithms play a crucial role and have attracted considerable attention from numerous researchers. Moreover, neural networks have the characteristics of high flexibility, complex computation, and a large amount of data; which also indicates the requirements of high performance, low-power consumption, and flexibility for hardware computing platforms.This study aims to propose a reconfigurable hardware architecture to meet the flexibility requirements of a neural network. Based on the proposed architecture, the corresponding data access optimization schemes are explored to reduce the power consumption. In the optimization of the storage system, an acceleration scheme of neural network based on eDRAM and ReRAM scheme, which is used for computing and storage integration, satisfy the requirement of neural network computing. Regarding high-performance computing, we have proposed convolution optimization schemes based on integral and filter splitting feature reconstruction to enable low bit neural network operations to meet high-performance requirements.


Funded by

国家自然科学基金(61774094)

国家科技重大专项(2018ZX01031101-002)


References

[1] 魏少军, 刘雷波, 尹首一. 可重构计算. 北京: 科学出版社, 2014. Google Scholar

[2] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770--778. Google Scholar

[3] Chen Y H, Krishna T, Emer J S. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE J Solid-State Circuits, 2017, 52: 127-138 CrossRef ADS Google Scholar

[4] Gao C, Neil D, Ceolini E, et al. Deltarnn: a power-efficient recurrent neural network accelerator. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018. 21--30. Google Scholar

[5] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014,. arXiv Google Scholar

[6] Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. Google Scholar

[7] He K M, Zhang X Y, Ren S Q, et al. Delving deep into rectifiers: surpassing human-level performance on imageNet classification. In: Proceedings of IEEE International Conference on Computer Vision, 2016. 1026--1034. Google Scholar

[8] Jouppi N P, Young C, Patil N, et al. In-datacenter performance analysis of a tensor processing unit. 2017,. arXiv Google Scholar

[9] Donahue J, Hendricks L A, Guadarrama S, et al. Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of Computer Vision and Pattern Recognition, 2015. Google Scholar

[10] Yin S Y, Ouyang P, Tang S B, et al. A 1.06-to-5.09 TOPS/W reconfigurable hybrid-neural-network processor for deep learning applications. In: Proceedings of Symposium on VLSI Circuits, 2017. Google Scholar

[11] Moons B, Uytterhoeven R, Dehaene W, et al. 14.5 envision: a 0.26-to-10 TOPS/W subword-parallel dynamic- voltageaccuracy-frequency-scalable convolutional neural network processor in 28 nm FDSOI. In: Proceedings of IEEE InternationalSolid-State Circuits Conference (ISSCC), 2017. 246--257. Google Scholar

[12] Yan J, Yin S, Tu F. GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration. IEEE Trans Comput-Aided Des Integr Circuits Syst, 2018, 37: 2519-2529 CrossRef Google Scholar

[13] Tu F, Yin S, Ouyang P. Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns. IEEE Trans VLSI Syst, 2017, 25: 2220-2233 CrossRef Google Scholar

[14] Chen T S, Du Z D, Sun N H, et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGPLAN Notice, 2014, 49: 269--284. Google Scholar

[15] Zhang C, Li P, Sun G Y, et al. Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015. 161--170. Google Scholar

[16] Johnson J, Alahi A, Feifei L. Perceptual losses for real-time style transfer and super-resolution. In: Proceedings of European Conference on Computer Vision. 2016. Google Scholar

[17] Ledig C, Theis L, Huszar F, et al. Photo-realistic single image super-resolution using a generative adversarial network. 2016,. arXiv Google Scholar

[18] Tu F B, Wu W W, Yin S Y, et al. RANA: towards efficient neural acceleration with refresh-optimized embedded DRAM. In: Proceedings of the 45th Annual International Symposium on Computer Architecture, 2018. 340--352. Google Scholar

[19] Chi P, Li S C, Xu C, et al. PRIME: a novel processing-in-memory architecture for neural network computation in reram-based main memory. In: Proceedings of the 43rd Annual International Symposium on Computer Architecture (ISCA), 2016. 27--39. Google Scholar

[20] Yin S Y, Ouyang P, Yang J X, et al. An ultra-high energy-efficient reconfigurable processor for deep neural networks with binary/ternary weights in 28 nm CMOS. In: Proceedings of Symposia on VLSI Technology and Circuits, Honolulu, 2018. Google Scholar

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1