Abstract
Graph processing has evolved and expanded swiftly with artificial intelligence and big data technology. High-Bandwidth Memory (HBM), which delivers terabyte-level memory bandwidth, has opened up new development possibilities for FPGA-based graph accelerators. However, despite the tremendous expansion of underlying hardware capabilities, existing graph accelerators have not benefited too much. In this paper, we observe that the uniformed on-chip memory hierarchy is the key to the low scalability of existing graph accelerators. We present a novel graph accelerator with a distributed on-chip memory hierarchy called GraphS. The on-chip memory of GraphS is divided into numerous tiny blocks, each of which is assigned to only one Processing Element (PE). Different PEs are connected through a scalable network-on-chip (NoC). For realistic graphs with power-law properties, a degree-aware preprocessing method is designed to balance the workload among different PEs. Our results with various graph algorithms demonstrate that GraphS can outperform state-of-the-art ForeGraph by up to 21.84\(\times \).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Katz, G., Kider, J.: All-pairs shortest-paths for large graphs on the GPU. In: Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pp. 47–55. ACM (2008)
Jamshidi, K., Mahadasa, R., Vora, K.: Peregrine: a pattern-aware graph mining system. In: Proceedings of the 15th European Conference on Computer Systems, pp. 1–16. ACM (2020)
Paden, B., Cap, M., Yong, S.Z., Yershov, D., Frazzoli, E.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)
Chong, Y., Ding, Y., Yan, Q., Pan, S.: Graph-based semi-supervised learning: a review. Neurocomputing 408(1), 216–230 (2020)
Zhang, Y., Liao, X., Jin, H., He, B., Liu, H., Gu, L.: Digraph: an efficient path-based iterative directed graph processing system on multiple GPUs. In: Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 601–614. ACM (2019)
Dhulipala, L., Blelloch, G.E., Shun, J.: Theoretically efficient parallel graph algorithms can be fast and scalable. ACM Trans. Parallel Comput. 8(1), 1–70 (2021)
Shun, J.: Practical parallel hypergraph algorithms. In: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 232–249. ACM (2020)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, pp. 17–30. USENIX Association (2021)
Sabet, A.H.N., Zhao, Z., Gupta, R.: Subway: minimizing data transfer during out-of-GPU-memory graph processing. In: Proceedings of the 15th European Conference on Computer Systems, pp. 1–16. ACM (2020)
Dai, G., Huang, T., Chi, Y., Xu, N., Wang, Y., Yang, H.: Foregraph: exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, pp. 217–226. ACM (2017)
Ma, X., Zhang, D., Chiou, D.: FPGA-accelerated transactional execution of graph workloads. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, pp. 227–236. ACM (2017)
Ham, T.J., Wu, L., Sundaram, N., Satish, N., Martonosi, M.: Graphicionado: a high-performance and energy-efficient accelerator for graph analytics. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 1–13. IEEE (2016)
Rahman, S., Abu-Ghazaleh, N., Gupta, R.: Graphpulse: an event-driven hardware accelerator for asynchronous graph processing. In: Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 908–921. IEEE (2020)
Yao, P., Zheng, L., Liao, X., Jin, H., He, B.: An efficient graph accelerator with parallel data conflict management. In: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, pp. 1–12. ACM (2018)
Yao, P., et al.: Scalagraph: a scalable accelerator for massively parallel graph processing. In: Proceedings of the 28th IEEE International Symposium on High-Performance Computer Architecture, pp. 1–13. IEEE (2022)
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146. ACM (2010)
Abeydeera, M., Sanchez, D.: Chronos: efficient speculative parallelism for accelerators. In: Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 1247–1262. ACM (2020)
Cybenko, G.: Dynamic load balancing for distributed memory multiprocessors. J. Parallel Distrib. Comput. 7(2), 279–301 (1989)
Li, S., Yang, Z., Reddy, D., Srivastava, A., Jacob, B.: DRAMsim3: a cycle-accurate, thermal-capable DRAM simulator. IEEE Comput. Archit. Lett. 19(2), 106–109 (2020)
Leskovec, J., Krevl, A.: SNAP datasets: Stanford large network dataset collection (2014). http://snap.stanford.edu/data
Acknowledgment
This work is supported by the NSFC (No. 62072195, 61825202, 61832006, and 61929103).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Zheng, R. et al. (2023). An Efficient Graph Accelerator with Distributed On-Chip Memory Hierarchy. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. Lecture Notes in Computer Science, vol 13777. Springer, Cham. https://doi.org/10.1007/978-3-031-22677-9_42
Download citation
DOI: https://doi.org/10.1007/978-3-031-22677-9_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22676-2
Online ISBN: 978-3-031-22677-9
eBook Packages: Computer ScienceComputer Science (R0)