Abstract
The growing need for inference on edge devices brings with it a necessity for efficient hardware, optimized for particular computational kernels, such as Sparse Matrix-Vector Multiplication (SpMV). With the RISC-V Instruction Set Architecture (ISA) providing unprecedented freedom to hardware designers, there is now a greater opportunity to tailor these microarchitectures to both the application requirements and the data it is expected to process. In this paper, we demonstrate the use of the insights provided by the Cache-Aware Roofline Model (CARM) in the hardware design process, optimizing a RISC-V architecture for efficient and performant execution of SpMV. Specifically, we assess the effect architectural parameters associated with the processor’s cache and floating-point unit have on the architecture and SpMV performance. Following a reparameterization closely guided by the CARM, we demonstrate a \(2.04\times \) improvement in performance and a significant decrease in underused computational resources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alappat, C., et al.: Level-based blocking for sparse matrices: sparse matrix-power-vector multiplication. IEEE Trans. Parallel Distrib. Syst. 34(2), 581–597 (2023)
Chen, X., Chen, Y., et al.: ReGraph: scaling graph processing on HBM-enabled FPGAs with heterogeneous pipelines. Technical report (2022). arXiv:2203.02676 [cs] type: article
Davis, T.A., Hu, Y.: The university of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1–25 (2011)
Elafrou, A., Goumas, G., Koziris, N.: Conflict-free symmetric sparse matrix-vector multiplication on multicore architectures. In: International Conference for High Performance Computing. Networking, Storage and Analysis, Denver, Colorado, pp. 1–15. ACM (2019)
Ilic, A., Pratas, F., Sousa, L.: Cache-aware roofline model: upgrading the loft. IEEE Comput. Archit. Lett. 13(1), 21–24 (2014)
Koohi Esfahani, M., Kilpatrick, P., Vandierendonck, H.: Exploiting in-hub temporal locality in SpMV-based graph processing. In: International Conference on Parallel Processing, Lemont, IL, USA, pp. 1–10. ACM (2021)
Li, S., Liu, D., Liu, W.: Optimized data reuse via reordering for sparse matrix-vector multiplication on FPGAs. In: IEEE/ACM International Conference On Computer Aided Design (ICCAD), Munich, Germany, pp. 1–9. IEEE (2021)
Lowe-Power, J., et al.: The gem5 Simulator: Version 20.0+. arXiv:2007.03152 [cs] (2020)
Marques, D., Duarte, H., et al.: Performance analysis with cache-aware roofline model in intel advisor. In: 2017 International Conference on High Performance Computing & Simulation (HPCS), pp. 898–907 (2017)
Namashivavam, N., Mehta, S., Yew, P.C.: Variable-sized blocks for locality-aware SpMV. In: IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Seoul, South Korea, pp. 211–221. IEEE (2021)
Shuvo, M.M.H., et al.: Efficient acceleration of deep learning inference on resource-constrained edge devices: a review. Proc. IEEE 111(1), 42–91 (2023)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for floating-point programs and multicore architectures. Technical report 1407078 (2009)
Xia, T., et al.: A comprehensive performance model of sparse matrix-vector multiplication to guide kernel optimization. IEEE Trans. Parallel Distrib. Syst. 34(2), 519–534 (2023)
Yesil, S., et al.: WISE: predicting the performance of sparse matrix vector multiplication with machine learning. In: ACM Symposium on Principles and Practice of Parallel Programming, Montreal, Canada, pp. 329–341. ACM (2023)
Zhao, H., et al.: Exploring better speculation and data locality in sparse matrix-vector multiplication on Intel Xeon. In: IEEE International Conference on Computer Design (ICCD), Hartford, CT, USA, pp. 601–609. IEEE (2020)
Acknowledgement
This project has received funding from the European High Performance Computing Joint Undertaking (JU) under Framework Partnership Agreement No 800928 and Specific Grant Agreement No 101036168 (EPI SGA2) and Grant agreement No 956213 (SparCity). The JU receives support from the European Union’s Horizon 2020 research and innovation programme and from Croatia, France, Germany, Greece, Italy, Netherlands, Norway, Portugal, Spain, Sweden, Switzerland and Turkey. It also received funding from FCT (Fundação para a Ciência e a Tecnologia, Portugal), through the UIDB/50021/2020 project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rodrigues, A., Sousa, L., Ilic, A. (2023). Performance Modelling-Driven Optimization of RISC-V Hardware for Efficient SpMV. In: Bienz, A., Weiland, M., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13999. Springer, Cham. https://doi.org/10.1007/978-3-031-40843-4_36
Download citation
DOI: https://doi.org/10.1007/978-3-031-40843-4_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40842-7
Online ISBN: 978-3-031-40843-4
eBook Packages: Computer ScienceComputer Science (R0)