Towards High-Performance Graph Processing: From a Hardware/Software Co-Design Perspective | Journal of Computer Science and Technology Skip to main content
Log in

Towards High-Performance Graph Processing: From a Hardware/Software Co-Design Perspective

  • Cover Article
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Graph processing has been widely used in many scenarios, from scientific computing to artificial intelligence. Graph processing exhibits irregular computational parallelism and random memory accesses, unlike traditional workloads. Therefore, running graph processing workloads on conventional architectures (e.g., CPUs and GPUs) often shows a significantly low compute-memory ratio with few performance benefits, which can be, in many cases, even slower than a specialized single-thread graph algorithm. While domain-specific hardware designs are essential for graph processing, it is still challenging to transform the hardware capability to performance boost without coupled software codesigns. This article presents a graph processing ecosystem from hardware to software. We start by introducing a series of hardware accelerators as the foundation of this ecosystem. Subsequently, the codesigned parallel graph systems and their distributed techniques are presented to support graph applications. Finally, we introduce our efforts on novel graph applications and hardware architectures. Extensive results show that various graph applications can be efficiently accelerated in this graph processing ecosystem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Wu S W, Sun F, Zhang W T, Xie X, Cui B. Graph neural networks in recommender systems: A survey. ACM Computing Surveys, 2023, 55(5): Article No. 97. DOI: https://doi.org/10.1145/3535101.

  2. Bullmore E, Sporns O. Complex brain networks: Graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience, 2009, 10(3): 186–198. DOI: https://doi.org/10.1038/NRN2575.

    Article  Google Scholar 

  3. Wang B Y, Dabbaghjamanesh M, Kavousi-Fard A, Mehraeen S. Cybersecurity enhancement of power trading within the networked microgrids based on blockchain and directed acyclic graph approach. IEEE Trans. Industry Applications, 2019, 55(6): 7300–7309. DOI: https://doi.org/10.1109/TIA.2019.2919820.

    Article  Google Scholar 

  4. Yin J, Tang M J, Cao J L, You M S, Wang H, Alazab M. Knowledge-driven cybersecurity intelligence: Software vulnerability coexploitation behavior discovery. IEEE Trans. Industrial Informatics, 2023, 19(4): 5593–5601. DOI: https://doi.org/10.1109/TII.2022.3192027.

    Article  Google Scholar 

  5. Luo J W, He M K, Pan W K, Ming Z. BGNN: Behavior-aware graph neural network for heterogeneous session-based recommendation. Frontiers of Computer Science, 2023, 17(5): 175336. DOI: https://doi.org/10.1007/s11704-022-2100-y.

    Article  Google Scholar 

  6. He D L, Yuan P P, Jin H. Answering reachability queries with ordered label constraints over labeled graphs. Frontiers of Computer Science, 2024, 18(1): 181601. DOI: https://doi.org/10.1007/s11704-022-2368-y.

    Article  Google Scholar 

  7. Gui C Y, Zheng L, He B S, Liu C, Chen X Y, Liao X F, Jin H. A survey on graph processing accelerators: Challenges and opportunities. Journal of Computer Science and Technology, 2019, 34(2): 339–371. DOI: https://doi.org/10.1007/S11390-019-1914-Z.

    Article  Google Scholar 

  8. Chen D, Jin H, Zheng L, Huang Y, Yao P C, Gui C Y, Wang Q G, Liu H F, He H H, Liao X F, Zheng R. A general offloading approach for near-DRAM processing-in-memory architectures. In Proc. the 2022 IEEE International Parallel and Distributed Processing Symposium, May 2022, pp.246–257. DOI: https://doi.org/10.1109/IPDPS53621.2022.00032.

  9. Yao P C, Zheng L, Liao X F, Jin H, He B S. An efficient graph accelerator with parallel data conflict management. In Proc. the 27th International Conference on Parallel Architectures and Compilation Techniques, Nov. 2018, Article No. 8. DOI: https://doi.org/10.1145/3243176.3243201.

  10. Jin H, Chen D, Zheng L, Huang Y, Yao P C, Zhao J, Liao X F, Jiang W B. Accelerating graph convolutional networks through a PIM-accelerated approach. IEEE Trans. Computers, 2023, 72(9): 2628–2640. DOI: https://doi.org/10.1109/TC.2023.3257514.

    Article  Google Scholar 

  11. Wang D W, Cui W Q. An efficient graph data compression model based on the germ quotient set structure. Frontiers of Computer Science, 2022, 16(6): 166617. DOI: https://doi.org/10.1007/s11704-022-1489-7.

    Article  Google Scholar 

  12. Fang P, Wang F, Shi Z, Feng D, Yi Q X, Xu X H, Zhang Y X. An efficient memory data organization strategy for application-characteristic graph processing. Frontiers of Computer Science, 2022, 16 (1): Article No. 161607. DOI: https://doi.org/10.1007/s11704-020-0255-y.

  13. Yao P C, Zheng L, Huang Y, Wang Q G, Gui C Y, Zeng Z, Liao X F, Jin H, Xue J L. ScalaGraph: A scalable accelerator for massively parallel graph processing. In Proc. the 2022 IEEE International Symposium on High-Performance Computer Architecture, Apr. 2022, pp.199–212. DOI: https://doi.org/10.1109/HPCA53966.2022.00023.

  14. Yao P C, Zheng L, Zeng Z, Huang Y, Gui C Y, Liao X F, Jin H, Xue J L. A locality-aware energy-efficient accelerator for graph mining applications. In Proc. the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2020, pp.895–907. DOI: https://doi.org/10.1109/MICRO50266.2020.00077.

  15. Rahman S, Abu-Ghazaleh N, Gupta R. GraphPulse: An event-driven hardware accelerator for asynchronous graph processing. In Proc. the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2020, pp.908–921. DOI: https://doi.org/10.1109/MICRO50266.2020.00078.

  16. Jin H, Yao P C, Liao X F. Towards dataflow based graph processing. Science China Information Sciences, 2017, 60 (12): Article No. 126102. DOI: https://doi.org/10.1007/s11432-017-9226-8.

  17. Li K X, Xu S X, Shao Z Y, Zheng R, Liao X F, Jin H. ScalaBFS2: A high performance BFS accelerator on an HBM-enhanced FPGA chip. ACM Trans. Reconfigurable Technology and Systems. DOI: https://doi.org/10.1145/3650037. (accepted)

  18. Zhang Y, Liao X F, Jin H, Gu L, Tan G, Zhou B B. Hot-Graph: Efficient asynchronous processing for real-world graphs. IEEE Trans. Computers, 2017, 66(5): 799–809. DOI: https://doi.org/10.1109/TC.2016.2624289.

    Article  MathSciNet  Google Scholar 

  19. Chen D, Gui C Y, Zhang Y, Jin H, Zheng L, Huang Y, Liao X F. GraphFly: Efficient asynchronous streaming graphs processing via dependency-flow. In Proc. the 2022 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2022. DOI: https://doi.org/10.1109/SC41404.2022.00050.

  20. Zhang Y, Liao X F, Jin H, Gu L, He L G, He B S, Liu H K. CGraph: A correlations-aware approach for efficient concurrent iterative graph processing. In Proc. the 2018 USENIX Annual Technical Conference, Jul. 2018, pp.441–452. https://www.usenix.org/system/files/conference/atc18/atc18-zhang-yu.pdf, Oct. 2023.

  21. Zhang Y, Liao X F, Jin H, He B S, Liu H K, Gu L. Di-Graph: An efficient path-based iterative directed graph processing system on multiple GPUs. In Proc. the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 2019, pp.601–614. DOI: https://doi.org/10.1145/3297858.3304029.

  22. Wang Q G, Zheng L, Huang Y, Yao P C, Gui C Y, Liao X F, Jin H, Jiang W B, Mao F B. GraSU: A fast graph update library for FPGA-based dynamic graph processing. In Proc. the 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2021, pp.149–159. DOI: https://doi.org/10.1145/3431920.3439288.

  23. Zhang Y, Liao X F, Jin H, Gu L, Zhou B B. FBSGraph: Accelerating asynchronous graph processing via forward and backward sweeping. IEEE Trans. Knowledge and Data Engineering, 2018, 30(5): 895–907. DOI: https://doi.org/10.1109/TKDE.2017.2781241.

    Article  Google Scholar 

  24. Liu C Q, Liu H F, Zheng L, Huang Y, Ye X Y, Liao X F, Jin H. FNNG: A high-performance FPGA-based accelerator for K-nearest neighbor graph construction. In Proc. the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Feb. 2023, pp.67–77. DOI: https://doi.org/10.1145/3543622.3573189.

  25. Wang Q G, Zheng L, Hu A, Huang Y, Yao P C, Gui C Y, Liao X F, Jin H, Xue J L. A data-centric accelerator for high-performance hypergraph processing. In Proc. the 55th Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2022, pp.1326–1341. DOI: https://doi.org/10.1109/MICRO56248.2022.00088.

  26. Chen D, He H H, Jin H, Zheng L, Huang Y, Shen X Y, Liao X F. MetaNMP: Leveraging Cartesian-like product to accelerate HGNNs with near-memory processing. In Proc. the 50th Annual International Symposium on Computer Architecture, Jun. 2023, Article No. 56. DOI: https://doi.org/10.1145/3579371.3589091.

  27. Zheng L, Zhao J S, Huang Y, Wang Q G, Zeng Z, Xue J L, Liao X F, Jin H. Spara: An energy-efficient ReRAM-based accelerator for sparse graph analytics applications. In Proc. the 2020 IEEE International Parallel and Distributed Processing Symposium, May 2020, pp.696–707. DOI: https://doi.org/10.1109/IPDPS47924.2020.00077.

  28. Huang Y, Zheng L, Yao P C, Wang Q G, Liao X F, Jin H, Xue J L. Accelerating graph convolutional networks using crossbar-based processing-in-memory architectures. In Proc. the 2022 IEEE International Symposium on High-Performance Computer Architecture, Apr. 2022, pp.1029–1042. DOI: https://doi.org/10.1109/HPCA53966.2022.00079.

  29. Huang Y, Zheng L, Yao P C, Zhao J S, Liao X F, Jin H, Xue J L. A heterogeneous PIM hardware-software co-design for energy-efficient graph processing. In Proc. the 2020 IEEE International Parallel and Distributed Processing Symposium, May 2020, pp.684–695. DOI: https://doi.org/10.1109/IPDPS47924.2020.00076.

  30. Ham T J, Wu L S, Sundaram N, Satish N, Martonosi M. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proc. the 49th Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2016. DOI: https://doi.org/10.1109/MICRO.2016.7783759.

  31. Dai G H, Huang T H, Chi Y Z, Xu N Y, Wang Y, Yang H Z. ForeGraph: Exploring large-scale graph processing on multi-FPGA architecture. In Proc. the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2017, pp.217–226. DOI: https://doi.org/10.1145/3020078.3021739.

  32. Chen X Y, Chen Y, Cheng F, Tan H S, He B S, Wong W F. ReGraph: Scaling graph processing on HBM-enabled FPGAs with heterogeneous pipelines. In Proc. the 55th Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2022, pp.1342–1358. DOI: https://doi.org/10.1109/MICRO56248.2022.00092.

  33. Yan M Y, Hu X, Li S C, Basak A, Li H, Ma X, Akgun I, Feng Y J, Gu P, Deng L, Ye X C, Zhang Z M, Fan D R, Xie Y. Alleviating irregularity in graph analytics acceleration: A hardware/software co-design approach. In Proc. the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2019, pp.615–628. DOI: https://doi.org/10.1145/3352460.3358318.

  34. Zhang Y F, Gao Q X, Gao L X, Wang C R. Maiter: An asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Trans. Parallel and Distributed Systems, 2014, 25(8): 2091–2100. DOI: https://doi.org/10.1109/TPDS.2013.235.

    Article  Google Scholar 

  35. Gonzalez J E, Low Y, Gu H J, Bickson D, Guestrin C. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, Oct. 2012, pp.17–30. https://www.usenix.org/system/files/conference/osdi12/osdi12-final-167.pdf, Oct. 2023.

  36. Vora K, Gupta R, Xu G Q. KickStarter: Fast and accurate computations on streaming graphs via trimmed approximations. In Proc. the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 2017, pp.237–251. DOI: https://doi.org/10.1145/3037697.3037748.

  37. Mariappan M, Vora K. GraphBolt: Dependency-driven synchronous processing of streaming graphs. In Proc. the 14th EuroSys Conference, Mar. 2019, Article No. 25. DOI: https://doi.org/10.1145/3302424.3303974.

  38. Wang Y Z H, Davidson A, Pan Y C, Wu Y D, Riffel A, Owens J D. Gunrock: A high-performance graph processing library on the GPU. In Proc. the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Feb. 2016, Article No. 11. DOI: https://doi.org/10.1145/2851141.2851145.

  39. Ben-Nun T, Sutton M, Pai S, Pingali K. Groute: An asynchronous multi-GPU programming model for irregular computations. ACM SIGPLAN Notices, 2017, 52(8): 235–248. DOI: https://doi.org/10.1145/3155284.3018756.

    Article  Google Scholar 

  40. Dong W, Moses C, Li K. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proc. the 20th International Conference on World Wide Web, Mar. 2011, pp.577–586. DOI: https://doi.org/10.1145/1963405.1963487.

  41. Wang Q G, Zheng L, Yuan J R, Huang Y, Yao P C, Gui C Y, Hu A, Liao X F, Jin H. Hardware-accelerated hypergraph processing with chain-driven scheduling. In Proc. the 2022 IEEE International Symposium on High-Performance Computer Architecture, Apr. 2022, pp.184–198. DOI: https://doi.org/10.1109/HPCA53966.2022.00022.

  42. Hu M, Strachan J P, Li Z Y, Grafals E M, Davila N, Graves C, Lam S, Ge N, Yang J J, Williams R S. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication. In Proc. the 53rd Annual Design Automation Conference, Jun. 2016, Article No. 19. DOI: https://doi.org/10.1145/2897937.2898010.

  43. Chi P, Li S C, Xu C, Zhang T, Zhao J S, Liu Y P, Wang Y, Xie Y. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In Proc. the 43rd Annual International Symposium on Computer Architecture, Jun. 2016, pp.27–39. DOI: https://doi.org/10.1109/ISCA.2016.13.

  44. Song L H, Zhuo Y W, Qian X H, Li H, Chen Y R. GraphR: Accelerating graph processing using ReRAM. In Proc. the 2018 IEEE International Symposium on High Performance Computer Architecture, Feb. 2018, pp.531–543. DOI: https://doi.org/10.1109/HPCA.2018.00052.

  45. Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. arXiv: 1609.02907, 2016. https://arxiv.org/abs/1609.02907, Mar. 2024.

  46. Jin T S, Dai H Q, Cao L J, Zhang B C, Huang F Y, Gao Y, Ji R R. Deepwalk-aware graph convolutional networks. Science China Information Sciences, 2022, 65(5): 152104. DOI: https://doi.org/10.1007/s11432-020-3318-5.

    Article  MathSciNet  Google Scholar 

  47. Bai J Y, Guo J, Wang C C, Chen Z Y, He Z, Yang S, Yu P P, Zhang Y, Guo Y W. Deep graph learning for spatially-varying indoor lighting prediction. Science China Information Sciences, 2023, 66 (3): Article No. 132106. DOI: https://doi.org/10.1007/s11432-022-3576-9.

  48. Fey M, Lenssen J E. Fast graph representation learning with PyTorch geometric. arXiv: 1903.02428, 2019. https://arxiv.org/abs/1903.02428, Mar. 2024.

Download references

Acknowledgments

We thank the following students for their contributions, where Ke-Xin Li and Bing Zhu participated in the work on the graph accelerators design, Dan Chen, Zhao-Zeng An, Yu-Jian Liao, Qian-Ge Shen, Dong-Hao He, Shi-Jun Li, Xin Ning, and Zhi-Ying Huang participated in the work on the graph software environment, and Dan Chen, Chao-Qiang Liu, Hai-Feng Liu, Hai-Heng He, and Zhao-Zeng An participated in the work on the emerging hardware/software co-design.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai Jin  (金 海).

Ethics declarations

Conflict of Interest The authors declare that they have no conflict of interest.

Additional information

This work was supported by the National Key Research and Development Program of China under Grant No. 2023YFB-4502300.

Xiao-Fei Liao is a professor in the School of Computer Science and Technology at Huazhong University of Science and Technology (HUST), Wuhan. He received his Ph.D. degree in computer science and engineering from HUST, Wuhan, in 2005. He was awarded Excellent Youth Award from the National Science Foundation of China in 2018, and CCF-IEEE CS Young Computer Scientist Award in 2017. His research interests are in the areas of computer architecture, system software, and big data processing.

Wen-Ju Zhao is a Ph.D. candidate in the School of Computer Science and Technology at Huazhong University of Science and Technology (HUST), Wuhan. His research interests include computer architecture and graph neural networks.

Hai Jin is a chair professor of computer science and engineering at Huazhong University of Science and Technology (HUST), Wuhan. Jin received his Ph.D. degree in computer engineering from HUST, Wuhan, in 1994. In 1996, he was awarded a German Academic Exchange Service Fellowship to visit the Technical University of Chemnitz, Straße der Nationen. Jin worked at The University of Hong Kong, Hong Kong, between 1998 and 2000, and as a visiting scholar at the University of Southern California, Los Angeles, between 1999 and 2000. He was awarded Excellent Youth Award from the National Natural Science Foundation of China in 2001. He has co-authored 22 books and published over 900 research papers. His research interests include computer architecture, virtualization technology, distributed computing, big data processing, network storage, and network security.

Peng-Cheng Yao received his Ph.D. degree in computer science and technology from the Huazhong University of Science and Technology, Wuhan, in 2022. He is now a postdoctoral fellow at Zhejiang Lab, Hangzhou. His research interests include graph processing and domain specific accelerator.

Yu Huang received his Ph.D. degree in computer science and technology from the Huazhong University of Science and Technology (HUST), Wuhan, in 2022. He is now a postdoctoral fellow at Zhejiang Lab, Hangzhou. His research focuses on computer architecture, graph processing, and processing in memory.

Qing-Gang Wang received his Ph.D. degree in computer science and technology from the Huazhong University of Science and Technology (HUST), Wuhan, in 2023. He is now a postdoctoral fellow at Zhejiang Lab, Hangzhou. His current research interests include graph processing and reconfigurable computing.

Jin Zhao received his Ph.D. degree in computer science from the Huazhong University of Science and Technology (HUST), Wuhan, in 2022. He is now a postdoctoral fellow at Zhejiang Lab, Hangzhou. His research focuses on computer architecture, system software, runtime optimization, programming model, and big data processing.

Long Zheng received his Ph.D. degree in computer science and technology from the Huazhong University of Science and Technology (HUST), Wuhan, in 2016. He is currently an associate professor with the School of Computer Science and Technology, HUST, Wuhan. His current research interests include program analysis, runtime systems, and heterogeneous computing with a particular focus on graph processing.

Yu Zhang received his Ph.D. degree in computer science from the Huazhong University of Science and Technology (HUST), Wuhan, in 2016. His research focuses on computer architecture and system, runtime optimization, programming model, and big data processing.

Zhi-Yuan Shao received his Ph.D. degree in computer science and technology from Huazhong University of Science and Technology (HUST), Wuhan, in 2005. He is now a professor of computer science and engineering at HUST in Wuhan. His research interests are in the areas of graph computing, computing system, and big-data processing.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, XF., Zhao, WJ., Jin, H. et al. Towards High-Performance Graph Processing: From a Hardware/Software Co-Design Perspective. J. Comput. Sci. Technol. 39, 245–266 (2024). https://doi.org/10.1007/s11390-024-4150-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-024-4150-0

Keywords

Navigation