An Efficient Graph Accelerator with Distributed On-Chip Memory Hierarchy

Zheng, Ran; Jiang, Yingxin; Wang, Yibo; Su, Yongbo; Zheng, Long; Yao, Pengcheng; Liao, Xiaofei; Jin, Hai

doi:10.1007/978-3-031-22677-9_42

Ran Zheng¹¹,
Yingxin Jiang¹¹,
Yibo Wang¹¹,
Yongbo Su¹¹,
Long Zheng^11,12,
Pengcheng Yao^11,12,
Xiaofei Liao¹¹ &
…
Hai Jin¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13777))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1913 Accesses

Abstract

Graph processing has evolved and expanded swiftly with artificial intelligence and big data technology. High-Bandwidth Memory (HBM), which delivers terabyte-level memory bandwidth, has opened up new development possibilities for FPGA-based graph accelerators. However, despite the tremendous expansion of underlying hardware capabilities, existing graph accelerators have not benefited too much. In this paper, we observe that the uniformed on-chip memory hierarchy is the key to the low scalability of existing graph accelerators. We present a novel graph accelerator with a distributed on-chip memory hierarchy called GraphS. The on-chip memory of GraphS is divided into numerous tiny blocks, each of which is assigned to only one Processing Element (PE). Different PEs are connected through a scalable network-on-chip (NoC). For realistic graphs with power-law properties, a degree-aware preprocessing method is designed to balance the workload among different PEs. Our results with various graph algorithms demonstrate that GraphS can outperform state-of-the-art ForeGraph by up to 21.84\(\times \).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 12125; Price includes VAT (Japan)

Softcover Book: JPY 15157; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Distributed large-scale graph processing on FPGAs

Article Open access 04 June 2023

An adaptive breadth-first search algorithm on integrated architectures

Article 11 August 2018

FGAQ: Accelerating Graph Analytical Queries Using FPGA

References

Katz, G., Kider, J.: All-pairs shortest-paths for large graphs on the GPU. In: Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pp. 47–55. ACM (2008)
Google Scholar
Jamshidi, K., Mahadasa, R., Vora, K.: Peregrine: a pattern-aware graph mining system. In: Proceedings of the 15th European Conference on Computer Systems, pp. 1–16. ACM (2020)
Google Scholar
Paden, B., Cap, M., Yong, S.Z., Yershov, D., Frazzoli, E.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)
Article Google Scholar
Chong, Y., Ding, Y., Yan, Q., Pan, S.: Graph-based semi-supervised learning: a review. Neurocomputing 408(1), 216–230 (2020)
Article Google Scholar
Zhang, Y., Liao, X., Jin, H., He, B., Liu, H., Gu, L.: Digraph: an efficient path-based iterative directed graph processing system on multiple GPUs. In: Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 601–614. ACM (2019)
Google Scholar
Dhulipala, L., Blelloch, G.E., Shun, J.: Theoretically efficient parallel graph algorithms can be fast and scalable. ACM Trans. Parallel Comput. 8(1), 1–70 (2021)
Article MathSciNet Google Scholar
Shun, J.: Practical parallel hypergraph algorithms. In: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 232–249. ACM (2020)
Google Scholar
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, pp. 17–30. USENIX Association (2021)
Google Scholar
Sabet, A.H.N., Zhao, Z., Gupta, R.: Subway: minimizing data transfer during out-of-GPU-memory graph processing. In: Proceedings of the 15th European Conference on Computer Systems, pp. 1–16. ACM (2020)
Google Scholar
Dai, G., Huang, T., Chi, Y., Xu, N., Wang, Y., Yang, H.: Foregraph: exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, pp. 217–226. ACM (2017)
Google Scholar
Ma, X., Zhang, D., Chiou, D.: FPGA-accelerated transactional execution of graph workloads. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, pp. 227–236. ACM (2017)
Google Scholar
Ham, T.J., Wu, L., Sundaram, N., Satish, N., Martonosi, M.: Graphicionado: a high-performance and energy-efficient accelerator for graph analytics. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 1–13. IEEE (2016)
Google Scholar
Rahman, S., Abu-Ghazaleh, N., Gupta, R.: Graphpulse: an event-driven hardware accelerator for asynchronous graph processing. In: Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 908–921. IEEE (2020)
Google Scholar
Yao, P., Zheng, L., Liao, X., Jin, H., He, B.: An efficient graph accelerator with parallel data conflict management. In: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, pp. 1–12. ACM (2018)
Google Scholar
Yao, P., et al.: Scalagraph: a scalable accelerator for massively parallel graph processing. In: Proceedings of the 28th IEEE International Symposium on High-Performance Computer Architecture, pp. 1–13. IEEE (2022)
Google Scholar
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146. ACM (2010)
Google Scholar
Abeydeera, M., Sanchez, D.: Chronos: efficient speculative parallelism for accelerators. In: Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 1247–1262. ACM (2020)
Google Scholar
Cybenko, G.: Dynamic load balancing for distributed memory multiprocessors. J. Parallel Distrib. Comput. 7(2), 279–301 (1989)
Article Google Scholar
Li, S., Yang, Z., Reddy, D., Srivastava, A., Jacob, B.: DRAMsim3: a cycle-accurate, thermal-capable DRAM simulator. IEEE Comput. Archit. Lett. 19(2), 106–109 (2020)
Article Google Scholar
Leskovec, J., Krevl, A.: SNAP datasets: Stanford large network dataset collection (2014). http://snap.stanford.edu/data

Download references

Acknowledgment

This work is supported by the NSFC (No. 62072195, 61825202, 61832006, and 61929103).

Author information

Authors and Affiliations

National Engineering Research Center for Big Data Technology and System/Services Computing Technology and System Lab/Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Ran Zheng, Yingxin Jiang, Yibo Wang, Yongbo Su, Long Zheng, Pengcheng Yao, Xiaofei Liao & Hai Jin
Zhejiang Lab, Hangzhou, 311121, China
Long Zheng & Pengcheng Yao

Authors

Ran Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yingxin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yibo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yongbo Su
View author publications
You can also search for this author in PubMed Google Scholar
Long Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Pengcheng Yao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Liao
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Long Zheng .

Editor information

Editors and Affiliations

Technical University of Denmark, Kongens Lyngby, Denmark
Weizhi Meng
University of New Brunswick, Fredericton, NB, Canada
Rongxing Lu
University of Exeter, Exeter, UK
Geyong Min
Rutgers University, Newark, NJ, USA
Jaideep Vaidya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, R. et al. (2023). An Efficient Graph Accelerator with Distributed On-Chip Memory Hierarchy. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. Lecture Notes in Computer Science, vol 13777. Springer, Cham. https://doi.org/10.1007/978-3-031-22677-9_42

Download citation

DOI: https://doi.org/10.1007/978-3-031-22677-9_42
Published: 11 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22676-2
Online ISBN: 978-3-031-22677-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Efficient Graph Accelerator with Distributed On-Chip Memory Hierarchy