An Efficient Graph Accelerator with Distributed On-Chip Memory Hierarchy | SpringerLink
Skip to main content

An Efficient Graph Accelerator with Distributed On-Chip Memory Hierarchy

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13777))

  • 1839 Accesses

Abstract

Graph processing has evolved and expanded swiftly with artificial intelligence and big data technology. High-Bandwidth Memory (HBM), which delivers terabyte-level memory bandwidth, has opened up new development possibilities for FPGA-based graph accelerators. However, despite the tremendous expansion of underlying hardware capabilities, existing graph accelerators have not benefited too much. In this paper, we observe that the uniformed on-chip memory hierarchy is the key to the low scalability of existing graph accelerators. We present a novel graph accelerator with a distributed on-chip memory hierarchy called GraphS. The on-chip memory of GraphS is divided into numerous tiny blocks, each of which is assigned to only one Processing Element (PE). Different PEs are connected through a scalable network-on-chip (NoC). For realistic graphs with power-law properties, a degree-aware preprocessing method is designed to balance the workload among different PEs. Our results with various graph algorithms demonstrate that GraphS can outperform state-of-the-art ForeGraph by up to 21.84\(\times \).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 12125
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 15157
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Katz, G., Kider, J.: All-pairs shortest-paths for large graphs on the GPU. In: Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pp. 47–55. ACM (2008)

    Google Scholar 

  2. Jamshidi, K., Mahadasa, R., Vora, K.: Peregrine: a pattern-aware graph mining system. In: Proceedings of the 15th European Conference on Computer Systems, pp. 1–16. ACM (2020)

    Google Scholar 

  3. Paden, B., Cap, M., Yong, S.Z., Yershov, D., Frazzoli, E.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)

    Article  Google Scholar 

  4. Chong, Y., Ding, Y., Yan, Q., Pan, S.: Graph-based semi-supervised learning: a review. Neurocomputing 408(1), 216–230 (2020)

    Article  Google Scholar 

  5. Zhang, Y., Liao, X., Jin, H., He, B., Liu, H., Gu, L.: Digraph: an efficient path-based iterative directed graph processing system on multiple GPUs. In: Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 601–614. ACM (2019)

    Google Scholar 

  6. Dhulipala, L., Blelloch, G.E., Shun, J.: Theoretically efficient parallel graph algorithms can be fast and scalable. ACM Trans. Parallel Comput. 8(1), 1–70 (2021)

    Article  MathSciNet  Google Scholar 

  7. Shun, J.: Practical parallel hypergraph algorithms. In: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 232–249. ACM (2020)

    Google Scholar 

  8. Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, pp. 17–30. USENIX Association (2021)

    Google Scholar 

  9. Sabet, A.H.N., Zhao, Z., Gupta, R.: Subway: minimizing data transfer during out-of-GPU-memory graph processing. In: Proceedings of the 15th European Conference on Computer Systems, pp. 1–16. ACM (2020)

    Google Scholar 

  10. Dai, G., Huang, T., Chi, Y., Xu, N., Wang, Y., Yang, H.: Foregraph: exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, pp. 217–226. ACM (2017)

    Google Scholar 

  11. Ma, X., Zhang, D., Chiou, D.: FPGA-accelerated transactional execution of graph workloads. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, pp. 227–236. ACM (2017)

    Google Scholar 

  12. Ham, T.J., Wu, L., Sundaram, N., Satish, N., Martonosi, M.: Graphicionado: a high-performance and energy-efficient accelerator for graph analytics. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 1–13. IEEE (2016)

    Google Scholar 

  13. Rahman, S., Abu-Ghazaleh, N., Gupta, R.: Graphpulse: an event-driven hardware accelerator for asynchronous graph processing. In: Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 908–921. IEEE (2020)

    Google Scholar 

  14. Yao, P., Zheng, L., Liao, X., Jin, H., He, B.: An efficient graph accelerator with parallel data conflict management. In: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, pp. 1–12. ACM (2018)

    Google Scholar 

  15. Yao, P., et al.: Scalagraph: a scalable accelerator for massively parallel graph processing. In: Proceedings of the 28th IEEE International Symposium on High-Performance Computer Architecture, pp. 1–13. IEEE (2022)

    Google Scholar 

  16. Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146. ACM (2010)

    Google Scholar 

  17. Abeydeera, M., Sanchez, D.: Chronos: efficient speculative parallelism for accelerators. In: Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 1247–1262. ACM (2020)

    Google Scholar 

  18. Cybenko, G.: Dynamic load balancing for distributed memory multiprocessors. J. Parallel Distrib. Comput. 7(2), 279–301 (1989)

    Article  Google Scholar 

  19. Li, S., Yang, Z., Reddy, D., Srivastava, A., Jacob, B.: DRAMsim3: a cycle-accurate, thermal-capable DRAM simulator. IEEE Comput. Archit. Lett. 19(2), 106–109 (2020)

    Article  Google Scholar 

  20. Leskovec, J., Krevl, A.: SNAP datasets: Stanford large network dataset collection (2014). http://snap.stanford.edu/data

Download references

Acknowledgment

This work is supported by the NSFC (No. 62072195, 61825202, 61832006, and 61929103).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Long Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, R. et al. (2023). An Efficient Graph Accelerator with Distributed On-Chip Memory Hierarchy. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. Lecture Notes in Computer Science, vol 13777. Springer, Cham. https://doi.org/10.1007/978-3-031-22677-9_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22677-9_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22676-2

  • Online ISBN: 978-3-031-22677-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics