logo

SCIENTIA SINICA Informationis, Volume 50 , Issue 9 : 1407(2020) https://doi.org/10.1360/SSI-2020-0130

Reconfigurable computing: toward software defined chips

More info
  • ReceivedMay 11, 2020
  • AcceptedAug 4, 2020
  • PublishedSep 23, 2020

Abstract

With the rapid development of technology, software is rapidly evolving with emerging applications. Chips that fail to adapt to software (e.g., application-specific integrated circuits, ASICs) suffer from a short lifecycle and high nonrecurring engineering (NRE) costs. Meanwhile, as the projection of Moore's law and Dennard scaling are decreasing, energy efficiency has shown a diminishing return with new technologies. The computing capacity of general-purpose processors is limited due to power budgets. Consequently, future chips must jointly optimize flexibility, power efficiency, and ease of programmability. Reconfigurable chips combine the high flexibility of a general-purpose processor and high energy efficiency of ASIC by providing on-demand customization of their architectures. This article thoroughly reviews the development and architecture of reconfigurable chips. Moreover, the future challenges of reconfigurable chips are analyzed. Based on these challenges, future directions are also discussed.


Funded by

国家自然科学基金重点(61834002)

国家重点研发计划(2018YFB2202100)


References

[1] Bohr M. A 30 Year Retrospective on Dennard's MOSFET Scaling Paper. IEEE Solid-State Circuits Newsl, 2007, 12: 11-13 CrossRef Google Scholar

[2] Hartenstein R W, Hirschbiel A G, Riedmuller M. A novel ASIC design approach based on a new machine paradigm. IEEE J Solid-State Circuits, 1991, 26: 975-989 CrossRef ADS Google Scholar

[3] Horowitz M. 1.1 computing's energy problem (and what we can do about it). In: Proceedings of IEEE International Solid-state Circuits Conference (ISSCC), 2014. Google Scholar

[4] Liu L, Li Z, Yang C. HReA: An Energy-Efficient Embedded Dynamically Reconfigurable Fabric for 13-Dwarfs Processing. IEEE Trans Circuits Syst II, 2018, 65: 381-385 CrossRef Google Scholar

[5] Nicol C. A coarse grain reconfigurable array (cgra) for statically scheduled data flow computing. Wave Computing White Paper, 2017. Google Scholar

[6] Putnam A, Caulfield A M, Chung E S. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. IEEE Micro, 2015, 35: 10-22 CrossRef Google Scholar

[7] Ouyang J, Lin S, Qi W, et al. SDA: Software-defined accelerator for large-scale DNN systems. In: Proceedings of IEEE Hot Chips 26 Symposium (HCS), 2016. Google Scholar

[8] Chen F, Shan Y, Zhang Y, et al. Enabling FPGAs in the cloud. In: Proceedings of the 11th ACM Conference on Computing Frontiers, 2014. 1--10. Google Scholar

[9] Tessier R, Pocek K, DeHon A. Reconfigurable Computing Architectures. Proc IEEE, 2015, 103: 332-354 CrossRef Google Scholar

[10] Gokhale M, Graham P S. Reconfigurable Computing Systems. Proceedings of the IEEE, 2007,90(7):1201-1217. Google Scholar

[11] Qadeer W, Hameed R, Shacham O. Convolution engine. SIGARCH Comput Archit News, 2013, 41: 24-35 CrossRef Google Scholar

[12] Taylor M B. Is dark silicon useful? harnessing the four horsemen of the coming dark silicon apocalypse. In: Proceedings of DAC Design Automation Conference, 2012. 1131--1136. Google Scholar

[13] Dennard R H, Gaensslen F H, Yu H N. Design of ion-implanted MOSFET's with very small physical dimensions. IEEE J Solid-State Circuits, 1974, 9: 256-268 CrossRef ADS Google Scholar

[14] Sutter H, Larus J. Software and the Concurrency Revolution. Queue, 2005, 3: 54-62 CrossRef Google Scholar

[15] Beyond Moore's law. The Economist, 2015. Google Scholar

[16] Fuchs A, Wentzlaff D. The Accelerator Wall: Limits of Chip Specialization. In: Proceedings of 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), Washington, 2019. 1--14. Google Scholar

[17] Estrin G. Organization of computer systems: the fixed plus variable structure computer. In: Proceedings of Western Joint IRE-AIEE-ACM Computer Conference, New York, 1960. 33--40. Google Scholar

[18] DeHon A, Adams J, Delorimier M, et al. Design patterns for reconfigurable computing: field-programmable custom computing machines. In: Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2004. 13--23. Google Scholar

[19] Prabhakar R, Zhang Y, Koeplinger D, et al. Plasticine: A Reconfigurable Architecture For Parallel Paterns. In: Proceedings of ACM/IEEE International Symposium on Computer Architecture, 2017. Google Scholar

[20] Nowatzki T, Gangadhar V, Ardalani N, et al. Stream-Dataflow Acceleration. In: Proceedings of ACM/IEEE International Symposium on Computer Architecture, 2017. Google Scholar

[21] Hartenstein R. A decade of reconfigurable computing: a visionary retrospective. In: Proceedings of the conference on Design, automation and test in Europe, 2001. Google Scholar

[22] DeHon A. Fundamental Underpinnings of Reconfigurable Computing Architectures. Proc IEEE, 2015, 103: 355-378 CrossRef Google Scholar

[23] 李兆石. 高灵活可重构处理器的编程模型和硬件架构关键技术研究. 2018. Google Scholar

[24] Liu L, Zhu J, Li Z. A Survey of Coarse-Grained Reconfigurable Architecture and Design. ACM Comput Surv, 2020, 52: 1-39 CrossRef Google Scholar

[25] Zain-ul-Abdin , Svensson B. Evolution in architectures and programming methodologies of coarse-grained reconfigurable computing. Microprocessors MicroSyst, 2009, 33: 161-178 CrossRef Google Scholar

[26] Maggs B M, Matheson L R, Tarjan R E. Models of parallel computation: a survey and synthesis. In: Proceedings of 28th Hawaii International Conference on System Sciences, 1995. Google Scholar

[27] Asanovic K, Bodik R, Catanzaro B C, et al. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183. 2006. Google Scholar

[28] Liu L, Wang D, Yin S. SimRPU: A Simulation Environment for Reconfigurable Architecture Exploration. IEEE Trans VLSI Syst, 2014, 22: 2635-2648 CrossRef Google Scholar

[29] Mei B, Vernalde S, Verkest D, et al. ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix. In: Proceedings of International Conference on Field Programmable Logic and Application (FPL), 2003. Google Scholar

[30] Baumgarte V, Ehlers G, May F. J Supercomputing, 2003, 26: 167-184 CrossRef Google Scholar

[31] Liu L, Wang D, Zhu M. An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding. IEEE Trans Multimedia, 2015, 17: 1706-1720 CrossRef Google Scholar

[32] Dutta H, Kissler D, Hannig F. A holistic approach for tightly coupled reconfigurable parallel processors. Microprocessors MicroSyst, 2009, 33: 53-62 CrossRef Google Scholar

[33] Watkins M A, Nowatzki T, Carno A. Software transparent dynamic binary translation for coarse-grain reconfigurable architectures. In: Proceedings of International Symposium on High Performance Computer Architecture (HPCA), 2016. Google Scholar

[34] Clark N, Kudlur M, Park H, et al. Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization. In: Proceedings of International Symposium on Microarchitecture, 2004. Google Scholar

[35] Liu F, Ahn H, Beard S R, et al. DynaSpAM: Dynamic spatial architecture mapping using Out of Order instruction schedules. In: Proceedings of ACM/IEEE International Symposium on Computer Architecture, 2015. Google Scholar

[36] Elbirt A J, Paar C. An instruction-level distributed processor for symmetric-key cryptography. IEEE Trans Parallel Distrib Syst, 2005, 16: 468-480 CrossRef Google Scholar

[37] Fronte D, Perez A, Payrat E. Celator: A Multi-algorithm Cryptographic Co-processor. In: Proceedings of 2008 International Conference on Reconfigurable Computing and FPGAs, 2008. Google Scholar

[38] Mei B, Veredas F J, Masschelein B. Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture. In: Proceedings of International Conference on Field Programmable Logic and Applications, 2005. Google Scholar

[39] Hartmann M, Pantazis V V, Vander Aa T. Still Image Processing on Coarse-Grained Reconfigurable Array Architectures. J Sign Process Syst, 2010, 60: 225-237 CrossRef Google Scholar

[40] Novo D, Moffat W, Derudder V, et al. Mapping a multiple antenna SDM-OFDM receiver on the ADRES coarse-grained reconfigurable processor. In: Proceedings of IEEE Workshop on Signal Processing Systems Design and Implementation, 2005. Google Scholar

[41] Palkovic M, Cappelle H, Glassee M, et al. Mapping of 40 MHz MIMO SDM-OFDM Baseband Processing on Multi-Processor SDR Platform. In: Proceedings of IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems, 2008. Google Scholar

[42] Tu F, Yin S, Ouyang P. Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns. IEEE Trans VLSI Syst, 2017, 25: 2220-2233 CrossRef Google Scholar

[43] Yin S, Ouyang P, Tang S, et al. A 1.06-to-5.09 TOPS/W reconfigurable hybrid-neural-network processor for deep learning applications. In: Proceedings of Symposium on VLSI Circuits, 2017. Google Scholar

[44] Farabet C, Martini B, Corda B, et al. NeuFlow: A runtime reconfigurable dataflow processor for vision. In: Proceedings of Computer Vision and Pattern Recognition Workshops, 2011. Google Scholar

[45] Li B, Tan K, Luo L, et al. ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware. In: Proceedings of Conference on ACM SIGCOMM, 2016. Google Scholar

[46] Cong J, Bin Liu J, Neuendorffer S. High-Level Synthesis for FPGAs: From Prototyping to Deployment. IEEE Trans Comput-Aided Des Integr Circuits Syst, 2011, 30: 473-491 CrossRef Google Scholar

[47] Wang Z, He B, Zhang W, et al. A performance analysis framework for optimizing OpenCL applications on FPGAs. In: Proceedings of IEEE International Symposium on High PERFORMANCE Computer Architecture, 2016. Google Scholar

[48] Windh S, Ma X, Halstead R J. High-Level Language Tools for Reconfigurable Computing. Proc IEEE, 2015, 103: 390-408 CrossRef Google Scholar

[49] Sankaralingam K, Moore C R, Nagarajan R. TRIPS. ACM Trans Archit Code Optim, 2004, 1: 62-93 CrossRef Google Scholar

[50] Park H, Park Y, Mahlke S. Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications. In: Proceedings of IEEE/ACM International Symposium on Microarchitecture, 2009. Google Scholar

[51] Qadeer W, Hameed R, Shacham O. Convolution engine. SIGARCH Comput Archit News, 2013, 41: 24-35 CrossRef Google Scholar

[52] Robatmili B, Li D, Esmaeilzadeh H, et al. How to implement effective prediction and forwarding for fusable dynamic multicore architectures. In: Proceedings of IEEE International Symposium on High PERFORMANCE Computer Architecture, 2013. Google Scholar

[53] Li Z, Liu L, Deng Y, et al. Aggressive pipelining of irregular applications on reconfigurable hardware. In: Proceedings of 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), 2017. 24--28. Google Scholar

[54] Pager J, Jeyapaul R, Shrivastava A. A Software Scheme for Multithreading on CGRAs. ACM Trans Embed Comput Syst, 2015, 14: 1-26 CrossRef Google Scholar

[55] Chang K, Choi K. Mapping control intensive kernels onto coarse-grained reconfigurable array architecture. In: Proceedings of International SoC Design Conference, 2009. Google Scholar

[56] Kim C, Sethumadhavan S, Govindan M S, et al. Composable Lightweight Processors. In: Proceedings of IEEE/ACM International Symposium on Microarchitecture, 2007. Google Scholar

[57] Ramanathan N, Fleming S T, Wickerson J, et al. Hardware Synthesis of Weakly Consistent C Concurrency. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, New York, 2017. 169--178. Google Scholar

[58] Ramanathan N, Wickerson J, Winterstein F, et al. A Case for Work-stealing on FPGAs with OpenCL Atomics. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York: ACM, 2016. 48--53. Google Scholar

[59] Winterstein F, Bayliss S, Constantinides G A. High-level synthesis of dynamic data structures:A case study using Vivado HLS. In: Proceedings of International Conference on Field-Programmable Technology (FPT), 2013. 362--365. Google Scholar

[60] Thomas J, Hanrahan P, Zaharia M. Fleet: A Framework for Massively Parallel Streaming on FPGAs. In: Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operat-ing Systems, Lausanne Switzerland, 2020. 639--651. Google Scholar

[61] Zhou S, Kannan R, Prasanna V K. HitGraph: High-throughput Graph Processing Framework on FPGA. IEEE Trans Parallel Distrib Syst, 2019, 30: 2249-2264 CrossRef Google Scholar

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有

京ICP备14028887号-23       京公网安备11010102003388号