Abstract
Graph processing has been widely used in many scenarios, from scientific computing to artificial intelligence. Graph processing exhibits irregular computational parallelism and random memory accesses, unlike traditional workloads. Therefore, running graph processing workloads on conventional architectures (e.g., CPUs and GPUs) often shows a significantly low compute-memory ratio with few performance benefits, which can be, in many cases, even slower than a specialized single-thread graph algorithm. While domain-specific hardware designs are essential for graph processing, it is still challenging to transform the hardware capability to performance boost without coupled software codesigns. This article presents a graph processing ecosystem from hardware to software. We start by introducing a series of hardware accelerators as the foundation of this ecosystem. Subsequently, the codesigned parallel graph systems and their distributed techniques are presented to support graph applications. Finally, we introduce our efforts on novel graph applications and hardware architectures. Extensive results show that various graph applications can be efficiently accelerated in this graph processing ecosystem.
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Wu S W, Sun F, Zhang W T, Xie X, Cui B. Graph neural networks in recommender systems: A survey. ACM Computing Surveys, 2023, 55(5): Article No. 97. DOI: https://doi.org/10.1145/3535101.
Bullmore E, Sporns O. Complex brain networks: Graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience, 2009, 10(3): 186–198. DOI: https://doi.org/10.1038/NRN2575.
Wang B Y, Dabbaghjamanesh M, Kavousi-Fard A, Mehraeen S. Cybersecurity enhancement of power trading within the networked microgrids based on blockchain and directed acyclic graph approach. IEEE Trans. Industry Applications, 2019, 55(6): 7300–7309. DOI: https://doi.org/10.1109/TIA.2019.2919820.
Yin J, Tang M J, Cao J L, You M S, Wang H, Alazab M. Knowledge-driven cybersecurity intelligence: Software vulnerability coexploitation behavior discovery. IEEE Trans. Industrial Informatics, 2023, 19(4): 5593–5601. DOI: https://doi.org/10.1109/TII.2022.3192027.
Luo J W, He M K, Pan W K, Ming Z. BGNN: Behavior-aware graph neural network for heterogeneous session-based recommendation. Frontiers of Computer Science, 2023, 17(5): 175336. DOI: https://doi.org/10.1007/s11704-022-2100-y.
He D L, Yuan P P, Jin H. Answering reachability queries with ordered label constraints over labeled graphs. Frontiers of Computer Science, 2024, 18(1): 181601. DOI: https://doi.org/10.1007/s11704-022-2368-y.
Gui C Y, Zheng L, He B S, Liu C, Chen X Y, Liao X F, Jin H. A survey on graph processing accelerators: Challenges and opportunities. Journal of Computer Science and Technology, 2019, 34(2): 339–371. DOI: https://doi.org/10.1007/S11390-019-1914-Z.
Chen D, Jin H, Zheng L, Huang Y, Yao P C, Gui C Y, Wang Q G, Liu H F, He H H, Liao X F, Zheng R. A general offloading approach for near-DRAM processing-in-memory architectures. In Proc. the 2022 IEEE International Parallel and Distributed Processing Symposium, May 2022, pp.246–257. DOI: https://doi.org/10.1109/IPDPS53621.2022.00032.
Yao P C, Zheng L, Liao X F, Jin H, He B S. An efficient graph accelerator with parallel data conflict management. In Proc. the 27th International Conference on Parallel Architectures and Compilation Techniques, Nov. 2018, Article No. 8. DOI: https://doi.org/10.1145/3243176.3243201.
Jin H, Chen D, Zheng L, Huang Y, Yao P C, Zhao J, Liao X F, Jiang W B. Accelerating graph convolutional networks through a PIM-accelerated approach. IEEE Trans. Computers, 2023, 72(9): 2628–2640. DOI: https://doi.org/10.1109/TC.2023.3257514.
Wang D W, Cui W Q. An efficient graph data compression model based on the germ quotient set structure. Frontiers of Computer Science, 2022, 16(6): 166617. DOI: https://doi.org/10.1007/s11704-022-1489-7.
Fang P, Wang F, Shi Z, Feng D, Yi Q X, Xu X H, Zhang Y X. An efficient memory data organization strategy for application-characteristic graph processing. Frontiers of Computer Science, 2022, 16 (1): Article No. 161607. DOI: https://doi.org/10.1007/s11704-020-0255-y.
Yao P C, Zheng L, Huang Y, Wang Q G, Gui C Y, Zeng Z, Liao X F, Jin H, Xue J L. ScalaGraph: A scalable accelerator for massively parallel graph processing. In Proc. the 2022 IEEE International Symposium on High-Performance Computer Architecture, Apr. 2022, pp.199–212. DOI: https://doi.org/10.1109/HPCA53966.2022.00023.
Yao P C, Zheng L, Zeng Z, Huang Y, Gui C Y, Liao X F, Jin H, Xue J L. A locality-aware energy-efficient accelerator for graph mining applications. In Proc. the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2020, pp.895–907. DOI: https://doi.org/10.1109/MICRO50266.2020.00077.
Rahman S, Abu-Ghazaleh N, Gupta R. GraphPulse: An event-driven hardware accelerator for asynchronous graph processing. In Proc. the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2020, pp.908–921. DOI: https://doi.org/10.1109/MICRO50266.2020.00078.
Jin H, Yao P C, Liao X F. Towards dataflow based graph processing. Science China Information Sciences, 2017, 60 (12): Article No. 126102. DOI: https://doi.org/10.1007/s11432-017-9226-8.
Li K X, Xu S X, Shao Z Y, Zheng R, Liao X F, Jin H. ScalaBFS2: A high performance BFS accelerator on an HBM-enhanced FPGA chip. ACM Trans. Reconfigurable Technology and Systems. DOI: https://doi.org/10.1145/3650037. (accepted)
Zhang Y, Liao X F, Jin H, Gu L, Tan G, Zhou B B. Hot-Graph: Efficient asynchronous processing for real-world graphs. IEEE Trans. Computers, 2017, 66(5): 799–809. DOI: https://doi.org/10.1109/TC.2016.2624289.
Chen D, Gui C Y, Zhang Y, Jin H, Zheng L, Huang Y, Liao X F. GraphFly: Efficient asynchronous streaming graphs processing via dependency-flow. In Proc. the 2022 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2022. DOI: https://doi.org/10.1109/SC41404.2022.00050.
Zhang Y, Liao X F, Jin H, Gu L, He L G, He B S, Liu H K. CGraph: A correlations-aware approach for efficient concurrent iterative graph processing. In Proc. the 2018 USENIX Annual Technical Conference, Jul. 2018, pp.441–452. https://www.usenix.org/system/files/conference/atc18/atc18-zhang-yu.pdf, Oct. 2023.
Zhang Y, Liao X F, Jin H, He B S, Liu H K, Gu L. Di-Graph: An efficient path-based iterative directed graph processing system on multiple GPUs. In Proc. the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 2019, pp.601–614. DOI: https://doi.org/10.1145/3297858.3304029.
Wang Q G, Zheng L, Huang Y, Yao P C, Gui C Y, Liao X F, Jin H, Jiang W B, Mao F B. GraSU: A fast graph update library for FPGA-based dynamic graph processing. In Proc. the 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2021, pp.149–159. DOI: https://doi.org/10.1145/3431920.3439288.
Zhang Y, Liao X F, Jin H, Gu L, Zhou B B. FBSGraph: Accelerating asynchronous graph processing via forward and backward sweeping. IEEE Trans. Knowledge and Data Engineering, 2018, 30(5): 895–907. DOI: https://doi.org/10.1109/TKDE.2017.2781241.
Liu C Q, Liu H F, Zheng L, Huang Y, Ye X Y, Liao X F, Jin H. FNNG: A high-performance FPGA-based accelerator for K-nearest neighbor graph construction. In Proc. the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Feb. 2023, pp.67–77. DOI: https://doi.org/10.1145/3543622.3573189.
Wang Q G, Zheng L, Hu A, Huang Y, Yao P C, Gui C Y, Liao X F, Jin H, Xue J L. A data-centric accelerator for high-performance hypergraph processing. In Proc. the 55th Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2022, pp.1326–1341. DOI: https://doi.org/10.1109/MICRO56248.2022.00088.
Chen D, He H H, Jin H, Zheng L, Huang Y, Shen X Y, Liao X F. MetaNMP: Leveraging Cartesian-like product to accelerate HGNNs with near-memory processing. In Proc. the 50th Annual International Symposium on Computer Architecture, Jun. 2023, Article No. 56. DOI: https://doi.org/10.1145/3579371.3589091.
Zheng L, Zhao J S, Huang Y, Wang Q G, Zeng Z, Xue J L, Liao X F, Jin H. Spara: An energy-efficient ReRAM-based accelerator for sparse graph analytics applications. In Proc. the 2020 IEEE International Parallel and Distributed Processing Symposium, May 2020, pp.696–707. DOI: https://doi.org/10.1109/IPDPS47924.2020.00077.
Huang Y, Zheng L, Yao P C, Wang Q G, Liao X F, Jin H, Xue J L. Accelerating graph convolutional networks using crossbar-based processing-in-memory architectures. In Proc. the 2022 IEEE International Symposium on High-Performance Computer Architecture, Apr. 2022, pp.1029–1042. DOI: https://doi.org/10.1109/HPCA53966.2022.00079.
Huang Y, Zheng L, Yao P C, Zhao J S, Liao X F, Jin H, Xue J L. A heterogeneous PIM hardware-software co-design for energy-efficient graph processing. In Proc. the 2020 IEEE International Parallel and Distributed Processing Symposium, May 2020, pp.684–695. DOI: https://doi.org/10.1109/IPDPS47924.2020.00076.
Ham T J, Wu L S, Sundaram N, Satish N, Martonosi M. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proc. the 49th Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2016. DOI: https://doi.org/10.1109/MICRO.2016.7783759.
Dai G H, Huang T H, Chi Y Z, Xu N Y, Wang Y, Yang H Z. ForeGraph: Exploring large-scale graph processing on multi-FPGA architecture. In Proc. the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2017, pp.217–226. DOI: https://doi.org/10.1145/3020078.3021739.
Chen X Y, Chen Y, Cheng F, Tan H S, He B S, Wong W F. ReGraph: Scaling graph processing on HBM-enabled FPGAs with heterogeneous pipelines. In Proc. the 55th Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2022, pp.1342–1358. DOI: https://doi.org/10.1109/MICRO56248.2022.00092.
Yan M Y, Hu X, Li S C, Basak A, Li H, Ma X, Akgun I, Feng Y J, Gu P, Deng L, Ye X C, Zhang Z M, Fan D R, Xie Y. Alleviating irregularity in graph analytics acceleration: A hardware/software co-design approach. In Proc. the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2019, pp.615–628. DOI: https://doi.org/10.1145/3352460.3358318.
Zhang Y F, Gao Q X, Gao L X, Wang C R. Maiter: An asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Trans. Parallel and Distributed Systems, 2014, 25(8): 2091–2100. DOI: https://doi.org/10.1109/TPDS.2013.235.
Gonzalez J E, Low Y, Gu H J, Bickson D, Guestrin C. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, Oct. 2012, pp.17–30. https://www.usenix.org/system/files/conference/osdi12/osdi12-final-167.pdf, Oct. 2023.
Vora K, Gupta R, Xu G Q. KickStarter: Fast and accurate computations on streaming graphs via trimmed approximations. In Proc. the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 2017, pp.237–251. DOI: https://doi.org/10.1145/3037697.3037748.
Mariappan M, Vora K. GraphBolt: Dependency-driven synchronous processing of streaming graphs. In Proc. the 14th EuroSys Conference, Mar. 2019, Article No. 25. DOI: https://doi.org/10.1145/3302424.3303974.
Wang Y Z H, Davidson A, Pan Y C, Wu Y D, Riffel A, Owens J D. Gunrock: A high-performance graph processing library on the GPU. In Proc. the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Feb. 2016, Article No. 11. DOI: https://doi.org/10.1145/2851141.2851145.
Ben-Nun T, Sutton M, Pai S, Pingali K. Groute: An asynchronous multi-GPU programming model for irregular computations. ACM SIGPLAN Notices, 2017, 52(8): 235–248. DOI: https://doi.org/10.1145/3155284.3018756.
Dong W, Moses C, Li K. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proc. the 20th International Conference on World Wide Web, Mar. 2011, pp.577–586. DOI: https://doi.org/10.1145/1963405.1963487.
Wang Q G, Zheng L, Yuan J R, Huang Y, Yao P C, Gui C Y, Hu A, Liao X F, Jin H. Hardware-accelerated hypergraph processing with chain-driven scheduling. In Proc. the 2022 IEEE International Symposium on High-Performance Computer Architecture, Apr. 2022, pp.184–198. DOI: https://doi.org/10.1109/HPCA53966.2022.00022.
Hu M, Strachan J P, Li Z Y, Grafals E M, Davila N, Graves C, Lam S, Ge N, Yang J J, Williams R S. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication. In Proc. the 53rd Annual Design Automation Conference, Jun. 2016, Article No. 19. DOI: https://doi.org/10.1145/2897937.2898010.
Chi P, Li S C, Xu C, Zhang T, Zhao J S, Liu Y P, Wang Y, Xie Y. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In Proc. the 43rd Annual International Symposium on Computer Architecture, Jun. 2016, pp.27–39. DOI: https://doi.org/10.1109/ISCA.2016.13.
Song L H, Zhuo Y W, Qian X H, Li H, Chen Y R. GraphR: Accelerating graph processing using ReRAM. In Proc. the 2018 IEEE International Symposium on High Performance Computer Architecture, Feb. 2018, pp.531–543. DOI: https://doi.org/10.1109/HPCA.2018.00052.
Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. arXiv: 1609.02907, 2016. https://arxiv.org/abs/1609.02907, Mar. 2024.
Jin T S, Dai H Q, Cao L J, Zhang B C, Huang F Y, Gao Y, Ji R R. Deepwalk-aware graph convolutional networks. Science China Information Sciences, 2022, 65(5): 152104. DOI: https://doi.org/10.1007/s11432-020-3318-5.
Bai J Y, Guo J, Wang C C, Chen Z Y, He Z, Yang S, Yu P P, Zhang Y, Guo Y W. Deep graph learning for spatially-varying indoor lighting prediction. Science China Information Sciences, 2023, 66 (3): Article No. 132106. DOI: https://doi.org/10.1007/s11432-022-3576-9.
Fey M, Lenssen J E. Fast graph representation learning with PyTorch geometric. arXiv: 1903.02428, 2019. https://arxiv.org/abs/1903.02428, Mar. 2024.
Acknowledgments
We thank the following students for their contributions, where Ke-Xin Li and Bing Zhu participated in the work on the graph accelerators design, Dan Chen, Zhao-Zeng An, Yu-Jian Liao, Qian-Ge Shen, Dong-Hao He, Shi-Jun Li, Xin Ning, and Zhi-Ying Huang participated in the work on the graph software environment, and Dan Chen, Chao-Qiang Liu, Hai-Feng Liu, Hai-Heng He, and Zhao-Zeng An participated in the work on the emerging hardware/software co-design.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest The authors declare that they have no conflict of interest.
Additional information
This work was supported by the National Key Research and Development Program of China under Grant No. 2023YFB-4502300.
Xiao-Fei Liao is a professor in the School of Computer Science and Technology at Huazhong University of Science and Technology (HUST), Wuhan. He received his Ph.D. degree in computer science and engineering from HUST, Wuhan, in 2005. He was awarded Excellent Youth Award from the National Science Foundation of China in 2018, and CCF-IEEE CS Young Computer Scientist Award in 2017. His research interests are in the areas of computer architecture, system software, and big data processing.
Wen-Ju Zhao is a Ph.D. candidate in the School of Computer Science and Technology at Huazhong University of Science and Technology (HUST), Wuhan. His research interests include computer architecture and graph neural networks.
Hai Jin is a chair professor of computer science and engineering at Huazhong University of Science and Technology (HUST), Wuhan. Jin received his Ph.D. degree in computer engineering from HUST, Wuhan, in 1994. In 1996, he was awarded a German Academic Exchange Service Fellowship to visit the Technical University of Chemnitz, Straße der Nationen. Jin worked at The University of Hong Kong, Hong Kong, between 1998 and 2000, and as a visiting scholar at the University of Southern California, Los Angeles, between 1999 and 2000. He was awarded Excellent Youth Award from the National Natural Science Foundation of China in 2001. He has co-authored 22 books and published over 900 research papers. His research interests include computer architecture, virtualization technology, distributed computing, big data processing, network storage, and network security.
Peng-Cheng Yao received his Ph.D. degree in computer science and technology from the Huazhong University of Science and Technology, Wuhan, in 2022. He is now a postdoctoral fellow at Zhejiang Lab, Hangzhou. His research interests include graph processing and domain specific accelerator.
Yu Huang received his Ph.D. degree in computer science and technology from the Huazhong University of Science and Technology (HUST), Wuhan, in 2022. He is now a postdoctoral fellow at Zhejiang Lab, Hangzhou. His research focuses on computer architecture, graph processing, and processing in memory.
Qing-Gang Wang received his Ph.D. degree in computer science and technology from the Huazhong University of Science and Technology (HUST), Wuhan, in 2023. He is now a postdoctoral fellow at Zhejiang Lab, Hangzhou. His current research interests include graph processing and reconfigurable computing.
Jin Zhao received his Ph.D. degree in computer science from the Huazhong University of Science and Technology (HUST), Wuhan, in 2022. He is now a postdoctoral fellow at Zhejiang Lab, Hangzhou. His research focuses on computer architecture, system software, runtime optimization, programming model, and big data processing.
Long Zheng received his Ph.D. degree in computer science and technology from the Huazhong University of Science and Technology (HUST), Wuhan, in 2016. He is currently an associate professor with the School of Computer Science and Technology, HUST, Wuhan. His current research interests include program analysis, runtime systems, and heterogeneous computing with a particular focus on graph processing.
Yu Zhang received his Ph.D. degree in computer science from the Huazhong University of Science and Technology (HUST), Wuhan, in 2016. His research focuses on computer architecture and system, runtime optimization, programming model, and big data processing.
Zhi-Yuan Shao received his Ph.D. degree in computer science and technology from Huazhong University of Science and Technology (HUST), Wuhan, in 2005. He is now a professor of computer science and engineering at HUST in Wuhan. His research interests are in the areas of graph computing, computing system, and big-data processing.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Liao, XF., Zhao, WJ., Jin, H. et al. Towards High-Performance Graph Processing: From a Hardware/Software Co-Design Perspective. J. Comput. Sci. Technol. 39, 245–266 (2024). https://doi.org/10.1007/s11390-024-4150-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-024-4150-0