Abstract
Loop tiling is a well-known loop transformation that enhances data locality in memory hierarchy. In this paper, we initially reveal two important inefficiencies of current analytical loop tiling models and we provide the theoretical background on how current analytical models can address these inefficiencies. To this end, we propose a new analytical model which is more accurate that the existing ones. We showcase, both theoretically and experimentally, that the proposed model can accurately estimate the number of cache misses for every generated tile size and as a result more efficient tile sizes are opted. Our evaluation results provide high cache misses gains and significant performance gains over gcc compiler and Pluto tool on an x86 platform.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baskaran, M.M., Hartono, A., Tavarageri, S., Henretty, T., Ramanujam, J., Sadayappan, P.: Parameterized tiling revisited. CGO 2010 (2010)
Bondhugula, U., Bandishti, V., Pananilath, I.: Diamond tiling: tiling techniques to maximize parallelism for stencil computations. In: IEEE TPDS, pp. 1285–1298 (2017)
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: SIGPLAN, vol. 43, no. 6 (2008)
Chatterjee, S., Parker, E., Hanlon, P.J., Lebeck, A.R.: Exact analysis of the cache behavior of nested loops. SIGPLAN Not. 36(5), 286–297 (2001)
Cohen, A., Zhao, J.: Flextended tiles: a flexible extension of overlapped tiles for polyhedral compilation. ACM TACO (2020)
Hammami, E., Slama, Y.: An overview on loop tiling techniques for code generation. In: 2017 IEEE/ACS AICCSA, pp. 280–287 (2017)
Hartono, A., et al.: Parametric multi-level tiling of imperfectly nested loops. In: ICS 2009, NY, USA, p. 147–157. New York (2009)
Hsu, C.H., Kremer, U.: A quantitative analysis of tile size selection algorithms. J. Supercomput. 27(3), 279–294 (2004)
Kelefouras, V., Djemame, K.: A methodology correlating code optimizations with data memory accesses, execution time and energy consumption. J. Supercomput. 75(10), 6710–6745 (2019). https://doi.org/10.1007/s11227-019-02880-z
Kelefouras, V.I., Athanasiou, G.S., Alachiotis, N., Michail, H.E., Kritikakou, A.S., Goutis, C.E.: A methodology for speeding up fast Fourier transform focusing on memory architecture utilization. IEEE Trans. Sig. Process. 59, 6217–6226 (2011)
Kelefouras, V., Georgios, K., Nikolaos, V.: Combining software cache partitioning and loop tiling for effective shared cache management. ACM Trans. Embed. Comput. Syst. 17(3), 72:1-72:25 (2018)
Kelefouras, V., Kritikakou, A., Mporas, I., Kolonias, V.: A high-performance matrix–matrix multiplication methodology for CPU and GPU architectures. J. Supercomput. 72(3), 804–844 (2016)
Kelefouras, V., Kritikakou, A., Papadima, E., Goutis, C.: A methodology for speeding up matrix vector multiplication for single/multi-core architectures. J. Supercomput. 71(7), 2644–2667 (2015). https://doi.org/10.1007/s11227-015-1409-9
Kelefouras, V., Kritikakou, A., Goutis, C.: A matrix–matrix multiplication methodology for single/multi-core architectures using SIMD. J. Supercomput. 68(3), 1418–1440 (2014). https://doi.org/10.1007/s11227-014-1098-9
Li, R., et al.: Analytical cache modeling and tilesize optimization for tensor contractions. In: SC 2019 (2019)
Mehta, S., Beeraka, G., Yew, P.C.: Tile size selection revisited. ACM Trans. Archit. Code Optim. 10(4), 1–27 (2013)
Nethercote, N., Walsh, R., Fitzhardinge, J.: Building workload characterization tools with valgrind. In: IISWC, p. 2. IEEE Computer Society (2006)
POUCHET, L.: Polybench/c. http://web.cse.ohio-state.edu/~pouchet.2/software/polybench/. Accessed 10 Oct 2020
Renganarayanan, L., Kim, D., Strout, M.M., Rajopadhye, S.: Parameterized loop tiling. ACM Trans. Program. Lang. Syst. 34(1), 1–41 (2012)
Sarkar, V., Megiddo, N.: An analytical model for loop tiling and its solution. In: IEEE ISPASS, pp. 146–153 (2000)
Sato, Y., Yuki, T., Endo, T.: An autotuning framework for scalable execution of tiled code via iterative polyhedral compilation. In: ACM TACO (2019)
Shirako, J., et al.: Analytical bounds for optimal tile size selection. In: CC 2012 (2012)
Stoltzfus, L., Hagedorn, B., Steuwer, M., Gorlatch, S., Dubach, C.: Tiling optimizations for stencil computations using rewrite rules in lift. ACM Trans. Archit. Code Optim. 16(4), 1–25 (2019)
Tavarageri, S., Pouchet, L.N., Ramanujam, J., Rountev, A., Sadayappan, P.: Dynamic selection of tile sizes. In: HIPC 2011 (2011)
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimization of software and the ATLAS project. Parallel Comput. 27(1–2), 3–35 (2001)
Zhou, X., Giacalone, J.P., Garzarán, M.J., Kuhn, R.H., Ni, Y., Padua, D.: Hierarchical overlapped tiling. In: CGO 2012 (2012)
Acknowledgements
This work has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No 957210 - XANDAR: X-by-Construction Design framework for Engineering Autonomous & Distributed Real-time Embedded Software Systems.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Kelefouras, V., Djemame, K., Keramidas, G., Voros, N. (2022). An Analytical Model for Loop Tiling Transformation. In: Orailoglu, A., Jung, M., Reichenbach, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2021. Lecture Notes in Computer Science, vol 13227. Springer, Cham. https://doi.org/10.1007/978-3-031-04580-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-04580-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04579-0
Online ISBN: 978-3-031-04580-6
eBook Packages: Computer ScienceComputer Science (R0)