logo

SCIENTIA SINICA Informationis, Volume 49, Issue 3: 295-313(2019) https://doi.org/10.1360/N112018-00281

Memory system optimization for graph processing: a survey

More info
  • ReceivedOct 18, 2018
  • AcceptedDec 12, 2018
  • PublishedMar 18, 2019

Abstract

Containing a variety of information, a graph is a complex data structure comprising vertices and edges. Graph processing or graph computing is the abstraction of the relation and properties of graph structures in real-life situations and performing some complex computations. Owing to the bottlenecks in the performance of the central processing unit, many coprocessors and domain-specific accelerators have been designed to improve running speed and to save energy. Considering the strong data dependence of graphs, improving the efficiency of memory access is a critical issue to improve system performance. In particular, memory management and optimization have become extremely important due to the expansion of graph data scale and the acceleration of various graph processing. This study aims to propose a memory architecture and management methods in graph processing on heterogeneous architecture. We describe the graph data layout that can improve the efficiency of memory access, analyze recent work on memory optimization and features of GPU, FPGA, ASIC, PIM, among others. Furthermore, we summarize relevant research progresses in recent years in China, and conclude the opportunities and challenges of graph processing in memory.


Funded by

国家重点研发计划(2018YFB1003500)


References

[1] Malewicz G, Austern M H, Bik A J, et al. Pregel: a system for largescale graph processing. In: Proceedings of the 2010 International Conference on Management of Data, Indianapolis, 2010. 135--146. Google Scholar

[2] Ham T J, Wu L, Sundaram N, et al. Graphicionado: a high-performance and energyecient accelerator for graph analytics. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, 2016. 1--13. Google Scholar

[3] Ozdal M M, Yesil S, Kim T, et al. Energy ecient architecture for graph analyticsaccelerators. In: Proceedings of the 23rd ACM/IEEE Annual International Symposium on Computer Architecture, Seoul, 2016. 166--177. Google Scholar

[4] Dai G H, Chi Y Z, Wang Y, et al. FPGP: graph processing framework on FPGA a case study of breadth-first search. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, 2016. 105--110. Google Scholar

[5] Nurvitadhi E, Weisz G, Wang Y, et al. Graphgen: an FPGA framework for vertex-centric graph computation. In: Proceedings of the 22nd IEEE International Symposium on Field-Programmable Custom Computing Machines, Boston, 2014. 25--28. Google Scholar

[6] Roy A, Mihailovic I, Zwaenepoel W. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles, Farmington, 2013. 472--488. Google Scholar

[7] Dai G, Huang T, Chi Y, et al. Fore-graph: exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of the 25th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, 2017. 217--226. Google Scholar

[8] Zhou S J, Prasanna V K. Accelerating graph analytics on CPU-FPGA heterogeneous platform. In: Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, Campinas, 2017. 137--144. Google Scholar

[9] Song L, Zhuo Y, Qian X, et al. GraphR: acceleratinggraph processing using ReRAM. In: Proceedings of the 24th IEEE International Symposium on High-Performance Computer Architecture, Vienna, 2018. 531--543. Google Scholar

[10] Zhou S, Chelmis C, Prasanna V K. High-throughput andenergy-ecient graph processing on FPGA. In: Proceedings of the 24th International Symposium Field-Programmable Custom ComputingMachines, Washington, 2016. 103--110. Google Scholar

[11] Lakhotia K, Kannan R, Prasanna V. Accelerating PageRank using partition-centric processing. In: Proceedings of the 28th USENIX Security Symposium on Annual Technical Conference, Santa Clara, 2018. 427--440. Google Scholar

[12] Gonzalez J E, Low Y, Gu H, et al. Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th Usenix Symposium on Operating Systems Design and Implementation, Hollywood, 2012. 17--30. Google Scholar

[13] Chen R, Shi J, Chen Y, et al. Powerlyra: differentiated graph computation and partitioning on skewed graphs. In: Proceedings of the 10th European Conference on Computer Systems, Bordeaux, 2015. 1--15. Google Scholar

[14] Maass S, Min C, Kashyap S, et al. Mosaic: processing a trillion-edge graph on a single machine. In: Proceedings of the 12th European Conference on Computer Systems, Belgrade, 2017. 527--543. Google Scholar

[15] Zhang J, Khoram S, Li J. Boosting the performance of FPGA-based graph processor using hybrid memory cube: a case for breadth rst search. In: Proceedings ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, 2017. 207--216. Google Scholar

[16] Zhu X W, Chen W G, Zheng W M, et al. Gemini: a computation-centric distributed graph processing system. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, 2016. 301--316. Google Scholar

[17] Oguntebi T, Olukotun K. GraphOps: a dataflow library for graph analytics acceleration. In: Proceedings of the 24th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, 2016. 111--117. Google Scholar

[18] Attia O G, Johnson T, Townsend K, et al. CyGraph: a reconfigurable architecture for parallel breadth-first search. In: Proceedings of the 28th International Parallel and Distributed Processing Symposium Workshops, Phoenix, 2014. 228--235. Google Scholar

[19] Dai G H, Huang T H, Chi Y Z, et al. GraphH: a processing-in-memory architecture for large-scale graph processing. IEEE Trans Comput-Aided Des Integr Circuits Syst, 2018. doi: 10.1109/TCAD.2018.2821565. Google Scholar

[20] Kyrola A, Blelloch G, Guestrin C. Graphchi: large-scale graph computation on just a pc. In: Proceedings of the 10th Usenix Symposium on Operating Systems Design and Implementation, Hollywood, 2012. 31--46. Google Scholar

[21] Zhang M X, Zhuo Y W, Wang C, et al. GraphP: reducing communication for PIM-based graph processing with efficient data partition. In: Proceedings of the 24th IEEE International Symposium on High-Performance Computer Architecture, Vienna, 2018. 544--557. Google Scholar

[22] Umuroglu Y, Morrison D, Jahre M. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. In: Proceedings of the 25th International Conference on Field Programmable Logic and Applications, London, 2015. 1--8. Google Scholar

[23] Zhu X W, Han W T, Chen W G. GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: Proceedings of USENIX Annual Technical Conference, Santa Clara, 2015. 375--386. Google Scholar

[24] Nodehi S A H, Qiu J Q, Zhao Z J. Tigr: transforming irregular graphs for GPU-friendly graph processing. In: Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems, Virginia, 2018. 622--636. Google Scholar

[25] Zhong J L, He B S. Medusa: Simplified Graph Processing on GPUs. IEEE Trans Parallel Distrib Syst, 2014, 25: 1543-1552 CrossRef Google Scholar

[26] Fu Z S, Personick M, Thompson B. MapGraph: a high level API for fast development of high performance graph analytics on GPUs. In: Proceedings of Workshop on GRAph Data management Experiences and Systems, Snowbird, 2014. 1--6. Google Scholar

[27] Shi X H, Liang J L, Di S, et al. Optimization of asynchronous graph processing on GPU with hybrid coloring model. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Francicso, 2015. 271--272. Google Scholar

[28] Khorasani F, Vora K, Gupta R, et al. CuSha: vertex-centric graph processing on GPUs. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, Vancouver, 2014. 239--252. Google Scholar

[29] Kim M S, An K, Park H, et al. GTS: a fast and scalable graph processing method based on streaming topology to GPUs. In: Proceedings of the 2016 International Conference on Management of Data, San Francisco, 2016. 447--461. Google Scholar

[30] Seo H, Kim J, Kim M S. Gstream: a graph streaming processing method for large-scale graphs on gpus. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Francicso, 2015. 253--254. Google Scholar

[31] Sengupta D, Song S L, Agarwal K, et al. GraphReduce: processing large-scale graphs on accelerator-based systems. In: Proceedings of the 27th International Conference for High Performance Computing, Networking, Storage and Analysis, Austin, 2015. 1--12. Google Scholar

[32] Wang Y Z, Davidson A, Pan Y C, et al. Gunrock: a high-performance graph processing library on the GPU. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Barcelona, 2016. 265--266. Google Scholar

[33] Nai L F, Hadidi R, Sim J, et al. GraphPIM: Enabling instruction-level PIM offloading in graph computing frameworks. In: Proceedings of the 23rd IEEE Symposium on High Performance Computer Architecture, Austin, 2017. 457--468. Google Scholar

[34] Gao M Y, Ayers G, Kozyrakis C. Practical near-data processing for in-memory analytics frameworks. In: Proceedings of the 24th International Conference on Parallel Architectures and Compilation Techniques, San Francisco, 2015. 113--124. Google Scholar

[35] Nai L, Xia Y, Tanase I G, et al. GraphBIG: understanding graph computing in the context of industrial solutions. In: Proceedings of the 27th International Conference for High Performance Computing, Networking, Storage and Analysis, Austin, 2015. 1--12. Google Scholar

[36] Han L, Shen Z, Liu D. A Novel ReRAM-Based Processing-in-Memory Architecture for Graph Traversal. ACM Trans Storage, 2018, 14: 1-26 CrossRef Google Scholar

[37] Ahn J, Hong S, Yoo S, et al. A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of the 42nd International Symposium on Computer Architecture, Portland, 2015. 105--117. Google Scholar

[38] Zhang J L, Li J. Degree-aware hybrid graph traversal on FPGA-HMC platform. In: Proceedings of the 26th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, 2018. 229--238. Google Scholar

[39] Betkaoui B, Thomas D B, Luk W, et al. A framework for FPGA acceleration of large graph problems: graphlet counting case study. In: Proceedings of International Conference on Field-Programmable Technology, Delhi, 2011. 1--8. Google Scholar

[40] Khoram S, Zhang J, Strange M, et al. Accelerating graph analytics by co-optimizing storage and access on an FPGA-HMC platform. In: Proceedings of the 26th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, 2018. 239--248. Google Scholar

[41] Wang Q B, Jiang W R, Xia Y L, et al. A messagepassing multi-softcore architecture on FPGA for breadth-first search. In: Proceedings of International Conference on Field-Programmable Technology, Beijing, 2010. 70--77. Google Scholar

[42] Windh S, Budhkar P, Najjar W A. CAMs as synchronizing caches for multithreaded irregular applications on FPGAs. In: Proceedings of the 34th IEEE/ACM International Conference on Computer-Aided Design, Austin, 2015. 331--336. Google Scholar

[43] Wang L, Yang X J, Dai H D. Scratchpad memory allocation for arrays in permutation graphs. Sci China Inf Sci, 2013, 56: 1-13 CrossRef Google Scholar

[44] Zhou J, Liu S, Guo Q, et al. Tunao: a high-performance and energy-efficient reconfigurable accelerator for graph processing. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing IEEE, Madrid, 2017. 731--734. Google Scholar

[45] Huang T H, Dai G H, Wang Y, et al. HyVE: hybrid vertex-edge memory hierarchy for energy-efficient graph processing. In: Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, Dresden, 2018. 973--978. Google Scholar

[46] Shao Y, Cui B, Ma L. PAGE: A Partition Aware Engine for Parallel Graph Computation. IEEE Trans Knowl Data Eng, 2015, 27: 518-530 CrossRef Google Scholar

[47] Ben-Nun T, Sutton M, Pai S, et al. Groute: an asynchronous multi-GPU programming model for irregular computations, In: Proceedings of the 23th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Austin, 2017. 235--248. Google Scholar

[48] Shi X, Cui B, Shao Y, et al. Tornado: a system for real-time iterative analysis over evolving data. In: Proceedings of the 2016 International Conference on Management of Data, San Francisco, 2016. 417--430. Google Scholar

[49] Pan P, Li C. Congra: towards efficient processing of concurrent graph queries on shared-memory machines. In: Proceedings of the 35th IEEE International Conference on Computer Design, Boston, 2017. 217--224. Google Scholar

[50] Jin H, Yao P C, Liao X F, et al. Towards dataflow-based graph accelerator. In: Proceedings of the 37th International Conference on Distributed Computing Systems, Atlanta, 2017. 1981--1992. Google Scholar

[51] Yao P C, Zheng L, Liao X F, et al. An efficient graph accelerator with parallel data conflict management. In: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, Limassol, 2018. 1--12. Google Scholar

[52] Xue J, Yang Z, Qu Z, et al. Seraph: an efficient, low-cost system for concurrent graph processing. In: Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, Vancouver, 2014. 227--238. Google Scholar

[53] Xie C N, Chen R, Guan H B, et al. Sync or async: time to fuse for distributed graph-parallel computation. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Francisco, 2015. 194--204. Google Scholar

[54] Zhang K Y, Chen R, Chen H B. NUMA-aware graph-structured analytics. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Francisco, 2015. 183--193. Google Scholar

[55] Zhang M X,Wu Y W, Chen K, et al. Exploring the hidden dimension in graph processing. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, 2016. 285--300. Google Scholar

[56] Sha M, Li Y, He B. Accelerating dynamic graph analytics on GPUs. Proc VLDB Endow, 2017, 11: 107-120 CrossRef Google Scholar

[57] Chen H Y, Sun Z G, Yi F. BufferBank storage: an economic, scalable and universally usable in-network storage model for streaming data applications. Sci China Inf Sci, 2016, 59: 012103 CrossRef Google Scholar

  • Figure 1

    Traversal-centric and computation-centric graph algorithms

  • Figure 2

    Graph topology and edge array representation. (a) Graph topology; (b) edge array

  • Figure 3

    Adjacent list representation. (a) Adjacent list(incoming edge); (b) compressed sparse column; (c) adjacent list(outgoing edge); (d) compressed sparse row

  • Table 1   Summary of graph processing systems on memory optimization
    Optimization scheme Graph processing systems [system,~architecture,~year]
    Data layout [GraphOps~\upcite{graphops},~FPGA,~2016], [CyGraph~\upcite{cygraph},~FPGA,~2014], [GraphH~\upcite{graphh},~PIM,~2018], [Graphicionado~\upcite{graphicionado},~ASIC,~2016], [ForeGraph~\upcite{foregraph},~FPGA,~2017], [FPGP~\upcite{fpgp},~FPGA,~2017], [Tigr~\upcite{tigr},~GPU,~2018], [X-Stream~\upcite{x-stream},~CPU,~2013], [Mosaic~\upcite{masaic},~CPU,~2017], etc.
    Graph partition [GraphChi~\upcite{graphchi},~CPU,~2012], [Cusha~\upcite{cusha},~GPU,~2014], [Graphicionado~\upcite{graphicionado},~ASIC,~2016], [GraphH\upcite{graphh},~PIM,~2018], [FPGP~\upcite{fpgp},~FPGA,~2017], [GraphP~\upcite{graphp},~PIM,~2018], [GStream~\upcite{gstream},~GPU,~2015], [GTS~\upcite{gts},~GPU,~2016], [GridGraph~\upcite{gridgraph},~CPU,~2015], [ForeGraph~\upcite{foregraph},~FPGA,~2017], [Gemini~\upcite{gemini},~CPU,~2016], [Frog~\upcite{frog},~GPU,~2015], [Page~\upcite{page},~CPU,~2015], [GraphReduce~\upcite{graphreduce},~GPU,~2015], etc.
    Storage media [GraphPIM~\upcite{graphpim},~PIM,~2017], [Graphicionado~\upcite{graphicionado},~ASIC,~2016], [FPGP~\upcite{fpgp},~FPGA,~2017], [ForeGraph~\upcite{foregraph},~FPGA,~2017], [FPGA-HMC~\upcite{zhangj-1},~FPGA,~2018], [Ozdal~\upcite{isca16},~ASIC,~2016], [GraphH~\upcite{graphh},~PIM,~2018], [GraphR~\upcite{graphr},~PIM,~2018], [GraphP~\upcite{graphp},~PIM,~2018], [Tesseract~\upcite{pim-isca15},~PIM,~2015], etc.
    Memory access [REBFS~\upcite{rebfs},~PIM,~2018], [Tesseract~\upcite{pim-isca15},~PIM,~2015], [GraphH\upcite{graphh},~PIM,~2018], [Graphicionado\upcite{graphicionado},~ASIC,~2016], [GraphGen~\upcite{graphgen},~FPGA,~2014], [FPGA-HMC~\upcite{zhangj-2},~FPGA,~2018], [GTS~\upcite{gts},~GPU,~2016], [Frog~\upcite{frog},~GPU,~2015], etc.
    Communication [Tesseract~\upcite{pim-isca15},~PIM,~2015], [GraphP~\upcite{graphp},~PIM,~2018], [GraphH~\upcite{graphh},~PIM,~2018], [Groute~\upcite{grout},~GPU,~2017], [FPGP~\upcite{fpgp},~FPGA,~2017], [ForeGraph~\upcite{foregraph},~FPGA,~2017], [Frog~\upcite{frog},~GPU,~2015], [Tornado~\upcite{tornado},~CPU,~2016], [GTS~\upcite{gts},~GPU,~2016], etc.
    Power [GraphBIG\upcite{graphbig},~PIM,~2015], [GraphH\upcite{graphh},~PIM,~2018], [Graphicionado~\upcite{graphicionado},~ASIC,~2016], [GraphP\upcite{graphp},~PIM,~2018], [GraphR~\upcite{graphr},~PIM,~2018], [GraphPIM~\upcite{graphpim},~PIM,~2017], [HyVE~\upcite{hyve},~CPU,~2018], [Tesseract~\upcite{pim-isca15},~PIM,~2015], [Ozdal~\upcite{isca16},~CPU,~2016], [Congra~\upcite{congraph},~CPU,~2017], [Tunao~\upcite{tunao},~ASIC,~2017], etc.

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1