logo

SCIENTIA SINICA Informationis, Volume 49, Issue 3: 247-255(2019) https://doi.org/10.1360/N112018-00283

Research on homegrown manycore architecture for intelligent computing

More info
  • ReceivedOct 18, 2018
  • AcceptedMar 7, 2019
  • PublishedMar 15, 2019

Abstract

In recent times, the demand for the computational capability of artificial intelligence (AI) is increasing rapidly. It is well-known that high parallelism algorithm and strong reusability of data provide more design space for processor architecture design. The manycore processor has a huge development space of AI with its strong on-chip computing power, flexible on-chip architecture, efficient on-chip communication, and flexible optimized storage. Based on the history of the development of manycore processors, this paper summarizes the main technical routes and focuses on the requirements of AI applications for the architecture and critical features of domestic manycore processors.


Funded by

核高基项目面向数据中心(云平台)与集群计算的智能计算单元(2018ZX01028-102)


References

[1] Jouppi N P, Young C, Patil N, et al. In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th International Symposium on Computer Architecture (ISCA), Toronto, 2017. Google Scholar

[2] NVIDIA. Whitepaper-NVIDIA's next generation CUDA compute architecture: Kepler GK110/210. https://www.geforce.com/landing-page/graphics-cards-with-kepler-architecture. Google Scholar

[3] Uijlings J R R, van de Sande K E A, Gevers T. Selective Search for Object Recognition. Int J Comput Vis, 2013, 104: 154-171 CrossRef Google Scholar

[4] Chen D C, Rabaey J M. A reconfigurable multiprocessor IC for rapid prototyping of algorithmic-specific high-speed DSP data paths. IEEE J Solid-State Circuits, 1992, 27: 1895-1904 CrossRef ADS Google Scholar

[5] Yeung A K W, Rabaey J M. A reconfigurable data driven multi-processor architecture for rapid prototyping of high throughput DSP algorithms. In: Proceedings of HICCS Conference, 1993. 169--178. Google Scholar

[6] Goldstein S C, Schmit H, Moe M, et al. PipeRench: A Coprocessor for Streaming Multimedia Acceleration. In: Proceedings of the 26th International Symposium on Computer Architecture, 1999. Google Scholar

[7] Michael Bedford Taylor. The Raw Processor Specification. http://groups.csail.mit.edu/cag/raw/. Google Scholar

[8] Du P, Weber R, Luszczek P. From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming. Parallel Computing, 2012, 38: 391-407 CrossRef Google Scholar

[9] Denneau M. Computing at the speed of life: the blue gene/cyclops supercomputer. In: CITI Distinguished Lecture Series. Huston: Rice University, 2002. Google Scholar

[10] Gschwind M, Hofstee H P, Flachs B. Synergistic Processing in Cell's Multicore Architecture. IEEE Micro, 2006, 26: 10-24 CrossRef Google Scholar

[11] Chrysos G. Intel Xeon Phi coprocessor (code name Knights Corner). In: Proceedings of the 24th Hot Chips Symposium, 2012. Google Scholar

[12] Seiler L, Carmean D, Sprangle E. Larrabee: A Many-Core x86 Architecture for Visual Computing. IEEE Micro, 2009, 29: 10-21 CrossRef Google Scholar

[13] Lindholm E, Nickolls J, Oberman S. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, 2008, 28: 39-55 CrossRef Google Scholar

[14] NVIDIA. NVIDIA Kepler GK110 Architecture Whitepaper. 2012. https://www.nvidia.com/content/PDF/kepler/NV_DS_Tesla_KCompute_Arch_May_2012_LR.pdf. Google Scholar

[15] Keckler S W, Dally W J, Khailany B. GPUs and the Future of Parallel Computing. IEEE Micro, 2011, 31: 7-17 CrossRef Google Scholar

[16] Huang H, Liu L, Song F L, et al. Architecture supported synchronization-based cache coherence protocol for many-core processors. Chinese J Comput, 2009, 32: 1618--1630. Google Scholar

[17] Zhou Y B, Zhang J C, Zhang S, et al. Software/hardware co-design for 1-D FFT optimization on many-core architecture. Chinese J Comput, 2008, 31: 2005--2014. Google Scholar

[18] Deng R Y, Chen H Y, Dou Q, et al. A parallel stream memory architecture for heterogeneous multi-core processor. Acta Electron Sin, 2009, 37: 312--317. Google Scholar

[19] Fang J R, Fu H H, Zhao W L, et al. swDNN: a library for accelerating deep learning applications on sun- way taihulight supercomputer. In: Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2017. Google Scholar

[20] Zhao W, Fu H, Fang J. Optimizing Convolutional Neural Networks on the Sunway TaihuLight Supercomputer. ACM Trans Archit Code Optim, 2018, 15: 1-26 CrossRef Google Scholar

[21] Li L D, Fang J R, Fu H H, et al. swCaffe: A parallel framework for accelerating deep learning applications on sunway TaihuLight. In: Proceedings of IEEE International Conference on Cluster Computing (CLUSTER), 2018. Google Scholar

[22] Zhao W L. Deep learning platform on sunway TaihuLight supercomputer. 2017. http://lms.comp.nus.edu.sg/sites/default/files/news-attachments/Industry3-ZhaoWenlai.pdf. Google Scholar

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1