logo

SCIENTIA SINICA Informationis, Volume 51 , Issue 2 : 173(2021) https://doi.org/10.1360/SSI-2020-0037

Development of processing-in-memory

More info
  • ReceivedFeb 29, 2020
  • AcceptedApr 30, 2020
  • PublishedJan 21, 2021

Abstract


Funded by

国家重点研发计划重点专项(2018YFB1003301)

国家自然科学基金(61832011)


References

[1] Mutlu O, Ghose S, Gómez-Luna J. Processing data where it makes sense: Enabling in-memory computation. Microprocessors MicroSyst, 2019, 67: 28-41 CrossRef Google Scholar

[2] Mutlu O, Ghose S, Gómez-Luna J, et al. Enabling practical processing in and near memory for data-intensive computing. In: Proceedings of the 56th Annual Design Automation Conference 2019, Las Vegas, 2019. 1--4. Google Scholar

[3] Singh G, Gómez-Luna J, Mariani G, et al. NAPEL: Near-memory computing application performance prediction via ensemble learning. In: Proceedings of the 56th ACM/IEEE Design Automation Conference (DAC), Las Vegas, 2019. 1--6. Google Scholar

[4] Boroumand A, Ghose S, Patel M, et al. CoNDA: efficient cache coherence support for near-data accelerators. In: Proceedings of the 46th International Symposium on Computer Architecture, Phoenix Arizona, 2019. 629--642. Google Scholar

[5] Ghose S, Boroumand A, Kim J S. Processing-in-memory: A workload-driven perspective. IBM J Res Dev, 2019, 63: 3:1-3:19 CrossRef Google Scholar

[6] Song L, Qian X, Li H, et al. Pipelayer: a pipelined reram-based accelerator for deep learning. In: Proceedings of 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). Austin: IEEE, 2017. 541--552. Google Scholar

[7] Farmahini-Farahani A, Ahn J H, Morrow K, et al. NDA: near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules. In: Proceedings of 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). Burlingame: IEEE, 2015. 283--295. Google Scholar

[8] Springer R, Lowenthal K D, Rountree B, et al. Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster. In: Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York: ACM, 2006. 230--238. Google Scholar

[9] Xiao P, Han N. A novel power-conscious scheduling algorithm for data-intensive precedence-constrained applications in cloud environments. IJHPCN, 2014, 7: 299-306 CrossRef Google Scholar

[10] Patki T, Lowenthal K D, Rountree B, et al. Exploring hardware overprovisioning in power-constrained, high performance computing. In: Proceedings of the 27th international ACM conference on International conference on supercomputing. Eugene Oregon: ACM, 2013. 173--182. Google Scholar

[11] Pugsley S H, Jestes J, Balasubramonian R. Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads. IEEE Micro, 2014, 34: 44-52 CrossRef Google Scholar

[12] Chi P, Li S, Xu C. PRIME. SIGARCH Comput Archit News, 2016, 44: 27-39 CrossRef Google Scholar

[13] Ahn J, Yoo S, Mutlu O, et al. PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture. In: Proceedings of 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). Portland: IEEE, 2015. 336--348. Google Scholar

[14] Gao M, Kozyrakis C. HRL: efficient and flexible reconfigurable logic for near-data processing. In: Proceedings of 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). Barcelona: IEEE, 2016. 126--137. Google Scholar

[15] Zhang D, Jayasena N, Lyashevsky A, et al. TOP-PIM: throughput-oriented programmable processing in memory. In: Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing. Vancouver: ACM, 2014. 85--98. Google Scholar

[16] Hsieh K, Ebrahimi E, Kim G, et al. Transparent offloading and mapping (TOM): enabling programmer-transparent near-data processing in GPU systems. In: Proceedings of ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), 2016. 44: 204--216. Google Scholar

[17] Xu C, Niu D, Muralimanohar N, et al. Understanding the trade-offs in multi-level cell ReRAM memory design. In: Proceedings of 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC). Ausin: IEEE, 2013. 1--6. Google Scholar

[18] Gokhale M, Holmes B, Iobst K. Processing in memory: the Terasys massively parallel PIM array. Computer, 1995, 28: 23-31 CrossRef Google Scholar

[19] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444. Google Scholar

[20] Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge: MIT Press, 2016. Google Scholar

[21] Soomro T R. Google Translation service issues: Religious text perspective. Journal of Global Research in Computer Science, 2013, 4(8): 40-43. Google Scholar

[22] Vazquez-Calvo B, Zhang L T, Pascual M, et al. Fan translation of games, anime, and fanfiction. Language, Learning and Technology, 2019, 23(1):49-71 DOI: 10.125/446722019. Google Scholar

[23] Li C, Qouneh A, Li T. iSwitch. SIGARCH Comput Archit News, 2012, 40: 512-523 CrossRef Google Scholar

[24] Li C, Zhou R, Li T. Enabling distributed generation powered sustainable high-performance data center. In: Proceedings of 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). Shenzhen: IEEE, 2013. 35--46. Google Scholar

[25] Li C, Zhang W, Cho C, et al. SolarCore: Solar energy driven multi-core architecture power management. In: Proceedings of 2011 IEEE 17th International Symposium on High Performance Computer Architecture. San Antonio: IEEE, 2011. 205--216. Google Scholar

[26] Hadidi R, Asgari B, Mudassar B A, et al. Demystifying the characteristics of 3D-stacked memories: a case study for Hybrid Memory Cube. In: Proceedings of 2017 IEEE International Symposium on Workload Characterization (IISWC). Seattle: IEEE, 2017. 66--75. Google Scholar

[27] Pei J, Deng L, Song S. Towards artificial general intelligence with hybrid Tianjic chip architecture. Nature, 2019, 572: 106-111 CrossRef ADS Google Scholar

[28] Farmahini-Farahani A, Ho Ahn J, Morrow K. DRAMA: An Architecture for Accelerated Processing Near Memory. IEEE Comput Arch Lett, 2015, 14: 26-29 CrossRef Google Scholar

[29] Nair R, Antao S F, Bertolli C. Active Memory Cube: A processing-in-memory architecture for exascale systems. IBM J Res Dev, 2015, 59: 17:1-17:14 CrossRef Google Scholar

[30] Vermij E, Hagleitner C, Fiorin L, et al. An architecture for near-data processing systems. In: Proceedings of the ACM International Conference on Computing Frontiers. Como: ACM, 2016. 357--360. Google Scholar

[31] Liu Z, Calciu I, Herlihy M, et al. Concurrent data structures for near-memory computing. In: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures. Washington: ACM, 2017. 235--245. Google Scholar

[32] Yazdanbakhsh A, Song C, Sacks J, et al. In-DRAM near-data approximate acceleration for GPUs. In: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques. Limassol Cyprus: ACM, 2018. 34. Google Scholar

[33] Wang Y, Han Y, Zhang L, et al. ProPRAM: exploiting the transparent logic resources in non-volatile memory for near data computing. In: Proceedings of the 52nd Annual Design Automation Conference. San Francisco: ACM, 2015. 47. Google Scholar

[34] Lee J H, Sim J, Kim H. SSync: Processing near memory for machine learning workloads with bounded staleness consistency models. In: Proceedings of 2015 International Conference on Parallel Architecture and Compilation (PACT), San Francisco: IEEE, 2015. 241--252. Google Scholar

[35] Kim D, Kung J, Chai S, et al. Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory. In: Proceedings of 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Seoul: IEEE, 2016. 380--392. Google Scholar

[36] Aga S, Jayasena N, Ignatowski M. Co-ML: a case for collaborative ML acceleration using near-data processing. In: Proceedings of the International Symposium on Memory Systems. Alexandria: ACM, 2019. 506--517. Google Scholar

[37] Ahn J, Hong S, Yoo S. A scalable processing-in-memory accelerator for parallel graph processing. SIGARCH Comput Archit News, 2016, 43: 105-117 CrossRef Google Scholar

[38] Nai L, Hadidi R, Sim J, et al. Graphpim: enabling instruction-level pim offloading in graph computing frameworks. In: Proceedings of 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). Austin: IEEE, 2017. 457--468. Google Scholar

[39] Jang J, Heo J, Lee Y, et al. Charon: specialized near-memory processing architecture for clearing dead objects in memory. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. Columbus: ACM, 2019. 726--739. Google Scholar

[40] Shafiee A, Nag A, Muralimanohar N. ISAAC. SIGARCH Comput Archit News, 2016, 44: 14-26 CrossRef Google Scholar

[41] Chen Y H, Krishna T, Emer J S. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE J Solid-State Circuits, 2017, 52: 127-138 CrossRef ADS Google Scholar

[42] Hu M, Strachan J P, Li Z, et al. Dot-product engine for neuromorphic computing: programming 1T1M crossbar to accelerate matrix-vector multiplication. In: Proceedings of the 53rd annual design automation conference. Austin: ACM, 2016. 19. Google Scholar

[43] Cheng M, Xia L, Zhu Z, et al. Time: a training-in-memory architecture for memristor-based deep neural networks. In: Proceedings of the 54th Annual Design Automation Conference. Austin: ACM, 2017. 26. Google Scholar

[44] Mao H, Song M, Li T, et al. LerGAN: a zero-free, low data movement and PIM-based GAN architecture. In: Proceedings of 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). Fukuoka: IEEE, 2018. 669--681. Google Scholar

[45] Ambrogio S, Narayanan P, Tsai H. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature, 2018, 558: 60-67 CrossRef ADS Google Scholar

[46] Feinberg B, Vengalam U K R, Whitehair N, et al. Enabling scientific computing on memristive accelerators. In: Proceedings of 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). Los Angeles: IEEE, 2018. 367--382. Google Scholar

[47] Song L, Zhuo Y, Qian X, et al. GraphR: accelerating graph processing using ReRAM. In: Proceedings of 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). Vienna: IEEE, 2018. 531--543. Google Scholar

[48] Li S, Xu C, Zou Q, et al. Pinatubo: a processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In: Proceedings of the 53rd Annual Design Automation Conference. Austin: ACM, 2016. 173. Google Scholar

[49] Xie L, Nguyen H A D, Yu J, et al. Scouting logic: a novel memristor-based logic design for resistive computing. In: Proceedings of 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). Bochum: IEEE, 2017. 176--181. Google Scholar

[50] Abu Lebdeh M, Abunahla H, Mohammad B. An Efficient Heterogeneous Memristive xnor for In-Memory Computing. IEEE Trans Circuits Syst I, 2017, 64: 2427-2437 CrossRef Google Scholar

[51] Imani M, Kim Y, Rosing T. Mpim: multi-purpose in-memory processing using configurable resistive memory. In: Proceedings of 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC). Makuhari Messe: IEEE, 2017. 757--763. Google Scholar

[52] Sim J, Kim M, Kim Y, et al. MAPIM: mat parallelism for high performance processing in non-volatile memory architecture. In: Proceedings of the 20th International Symposium on Quality Electronic Design (ISQED). Santa Clara: IEEE, 2019. 145--150. Google Scholar

[53] Imani M, Peroni D, Rosing T. Nvalt: Nonvolatile Approximate Lookup Table for GPU Acceleration. IEEE Embedded Syst Lett, 2018, 10: 14-17 CrossRef Google Scholar

[54] Imani M, Gupta S, Arredondo A, et al. Efficient query processing in crossbar memory. In: Proceedings of 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). Taipei: IEEE, 2017. 1--6. Google Scholar

[55] Yantir H E, Eltawil A M, Kurdahi F J. Approximate Memristive In-memory Computing. ACM Trans Embed Comput Syst, 2017, 16: 1-18 CrossRef Google Scholar

[56] Sun Y, Wang Y, Yang H. Energy-efficient SQL query exploiting RRAM-based process-in-memory structure. In: Proceedings of 2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA). Hsinchu: IEEE, 2017. 1--6. Google Scholar

[57] Kim H, Kim H, Yalamanchili S, et al. Understanding energy aspects of processing-near-memory for HPC workloads. In: Proceedings of the 2015 International Symposium on Memory Systems. Washington: ACM, 2015. 276--282. Google Scholar

[58] Mao H, Zhang X, Sun G, et al. Protect non-volatile memory from wear-out attack based on timing difference of row buffer hit/miss. In: Proceedings of Design, Automation & Test in Europe Conference & Exhibition (DATE). Lausanne: IEEE, 2017. 1623--1626. Google Scholar

[59] Qureshi M K, Karidis J, Franceschini M, et al. Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. In: Proceedings of 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). New York: IEEE, 2009. 14--23. Google Scholar

[60] Cai Y, Lin Y, Xia L, et al. Long live time: improving lifetime for training-in-memory engines by structured gradient sparsification. In: Proceedings of 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC). San Francisco: IEEE, 2018. 1--6. Google Scholar

[61] Xia L, Huangfu W, Tang T. Stuck-at Fault Tolerance in RRAM Computing Systems. IEEE J Emerg Sel Top Circuits Syst, 2018, 8: 102-115 CrossRef ADS Google Scholar

[62] Xia L, Liu M, Ning X, et al. Fault-tolerant training with on-line fault detection for RRAM-based neural computing systems. In: Proceedings of the 54th Annual Design Automation Conference 2017. Austin: ACM, 2017. 1--6. Google Scholar

[63] Liu C, Hu M, Strachan J P, et al. Rescuing memristor-based neuromorphic design with high defects. In: Proceedings of 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC). Austin: IEEE, 2017. 1--6. Google Scholar

[64] Ni L, Wang Y, Yu H, et al. An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar. In: Proceedings of 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC). Macao: IEEE, 2016. 280--285. Google Scholar

[65] Wang Y, Tang T, Xia L, et al. Energy efficient RRAM spiking neural network for real time classification. In: Proceedings of the 25th edition on Great Lakes Symposium on VLSI. Pittsburgh: IEEE, 2015. 189--194. Google Scholar

[66] Narayanan S, Shafiee A, Balasubramonian R. INXS: bridging the throughput and energy gap for spiking neural networks. In: Proceedings of 2017 International Joint Conference on Neural Networks (IJCNN). Anchorage: IEEE, 2017. 2451--2459. Google Scholar

[67] Ankit A, Sengupta A, Panda P, et al. Resparc: a reconfigurable and energy-efficient architecture with memristive crossbars for deep spiking neural networks. In: Proceedings of the 54th Annual Design Automation Conference 2017. Austin: ACM, 2017. 1--6. Google Scholar

[68] Xia L, Tang T, Huangfu W, et al. Switched by input: power efficient structure for RRAM-based convolutional neural network. In: Proceedings of the 53rd Annual Design Automation Conference. Austin: ACM, 2016. 1--6. Google Scholar

[69] Mao H, Shu J. 3D Memristor Array Based Neural Network Processing in Memory Architecture. Journal of Computer Research and Development, 2019, 56(6): 1149-1160 doi: 10.7544/issn1000-1239.2019.20190099. Google Scholar

[70] Ji Y, Zhang Y, Xie X, et al. Fpsa: a full system stack solution for reconfigurable reram-based nn accelerator architecture. In: Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, Providence, 2019. 733--747. Google Scholar

[71] Witten I H, Frank E. Data mining. SIGMOD Rec, 2002, 31: 76-77 CrossRef Google Scholar

[72] Sharif Razavian A, Azizpour H, Sullivan J, et al. CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Columbus: IEEE, 2014. 806--813. Google Scholar

[73] Manning C D., Surdeanu M, Bauer J, et al. The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Baltimore, 2014. 55--66. Google Scholar

[74] Schmidhuber J. Deep learning in neural networks: An overview. Neural Networks, 2015, 61: 85-117 CrossRef Google Scholar

[75] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 2672--2680. Google Scholar

[76] Kaelbling L P, Littman M L, Moore A W. Reinforcement Learning: A Survey. jair, 1996, 4: 237-285 CrossRef Google Scholar

[77] Low Y, Gonzalez J E, et al. Graphlab: A new framework for parallel machine learning. 2014,. arXiv Google Scholar

[78] Low Y, Gonzalez J, Kyrola A, et al. Distributed graphlab: A framework for machine learning in the cloud. 2012,. arXiv Google Scholar

[79] LeBeane M, Song S, Panda R, et al. Data partitioning strategies for graph workloads on heterogeneous clusters. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Austin: IEEE, 2015. 1--12. Google Scholar

[80] Gonzalez J E, Low Y, Gu H, et al. Powergraph: distributed graph-parallel computation on natural graphs. In: Presented as Part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), Hollywood, 2012. 17--30. Google Scholar

[81] Chen R, Shi J, Chen Y, et al. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. ACM Transactions on Parallel Computing (TOPC), 2019, 5(3): 1-39. Google Scholar

[82] Vigna G, Kemmerer R A. NetSTAT: a network-based intrusion detection approach. In: Proceedings of the 14th Annual Computer Security Applications Conference (Cat. No. 98EX217). Scottsdale: IEEE, 1998. 25--34. Google Scholar

[83] Agichtein E, Castillo C, Donato D, et al. Finding high-quality content in social media. In: Proceedings of the 2008 International Conference on Web Search and Data Mining. Palo Alto: ACM, 2008. 183--194. Google Scholar

[84] Page L, Brin S, Motwani R, et al. The Pagerank Citation Ranking: Bringing Order to the Web. Stanford InfoLab, 1999. Google Scholar

[85] Biemann C. Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In: Proceedings of the 1st Workshop on Graph Based Methods for Natural Language Processing, 2006. 73--80. Google Scholar

[86] Chesler E J, Lu L, Shou S. Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat Genet, 2005, 37: 233-242 CrossRef Google Scholar

[87] Linden G, Smith B, York J. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput, 2003, 7: 76-80 CrossRef Google Scholar

[88] Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol, 2008, 26: 1135-1145 CrossRef Google Scholar

[89] Shendure J, Balasubramanian S, Church G M. DNA sequencing at 40: past, present and future. Nature, 2017, 550: 345-353 CrossRef ADS Google Scholar

[90] Altschul S. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 1997, 25: 3389-3402 CrossRef Google Scholar

[91] Jha M, Malhotra R, Acharya R. A Generalized Lattice Based Probabilistic Approach for Metagenomic Clustering. IEEE/ACM Trans Comput Biol Bioinf, 2017, 14: 749-761 CrossRef Google Scholar

[92] Altschul S F, Gish W, Miller W. Basic local alignment search tool. J Mol Biol, 1990, 215: 403-410 CrossRef Google Scholar

[93] Ning Z. SSAHA: A Fast Search Method for Large DNA Databases. Genome Res, 2001, 11: 1725-1729 CrossRef Google Scholar

[94] Lancaster J, Buhler J, Chamberlain R D. Acceleration of ungapped extension in Mercury BLAST. Microprocessors MicroSyst, 2009, 33: 281-289 CrossRef Google Scholar

[95] Ling C, Benkrid K. Design and implementation of a CUDA-compatible GPU-based core for gapped BLAST algorithm. Procedia Comput Sci, 2010, 1: 495-504 CrossRef Google Scholar