Abstract
Data partitioning on heterogeneous HPC platforms is formulated as an optimization problem. The algorithm departs from the communication performance models of the processes representing their speeds and outputs a data tiling that minimizes the communication cost. Traditionally, communication volume is the metric used to guide the partitioning, but such metric is unable to capture the complexities introduced by uneven communication channels and the variety of patterns in the kernel communications. We discuss Analytical Communication Performance Models as a new metric in partitioning algorithms. They have not been considered in the past because of two reasons: prediction inaccuracy and lack of tools to automatically build and solve kernel communication formal expressions. We show how communication performance models fit the specific kernel and platform, and we present results that equal or even improve previous volume-based strategies.
Similar content being viewed by others
References
Beaumont O, Boudet V, Rastello F, Robert Y (2001) Matrix multiplication on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 12(10):1033–1051
Clarke D, Zhong Z, Rychkov V, Lastovetsky A (2014) FuPerMod: a software tool for the optimization of data-parallel applications on heterogeneous platforms. J Supercomput 69:61–69
Dongarra J, Pineau JF, Robert Y, Vivien F (2008) Matrix product on heterogeneous master-worker platforms. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, New York, NY, USA, PPoPP ’08, pp 53–62
Kalinov A, Lastovetsky A (2001) Heterogeneous distribution of computations solving linear algebra problems on networks of heterogeneous computers. J Parallel Distrib Comput 61(4):520–535
Lastovetsky A, Reddy R (2010) Distributed data partitioning for heterogeneous processors based on partial estimation of their functional performance models. In: Lin HX, Alexander M, Forsell M, Knüpfer A, Prodan R, Sousa L, Streit A (eds) Euro-Par 2009—parallel processing workshops. Springer, Berlin, pp 91–101
Malik T, Rychkov V, Lastovetsky A (2016) Network-aware optimization of communications for parallel matrix multiplication on hierarchical HPC platforms. Concurr Comput Pract Exp 28:802–821
Rico-Gallego JA, Díaz-Martín JC (2015) \(\tau \)-Lop: modeling performance of shared memory MPI. Parallel Comput 46:14–31
Rico-Gallego JA, Díaz-Martín JC, Lastovetsky AL (2016) Extending \(\tau \)-lop to model concurrent MPI communications in multicore clusters. Future Gener Comput Syst 61:66–82
Rico-Gallego JA, Lastovetsky AL, Díaz-Martín JC (2017) Model-based estimation of the communication cost of hybrid data-parallel applications on heterogeneous clusters. IEEE Trans Parallel Distrib Syst 28(11):3215–3228
van de Geijn RA, Watts J (1995) SUMMA: scalable universal matrix multiplication algorithm. Technical Report, Austin, TX, USA
Zhong Z, Rychkov V, Lastovetsky A (2015) Data partitioning on multicore and multi-GPU platforms using functional performance models. IEEE Trans Comput 64:2506–2518
Acknowledgements
This work was supported by the European Regional Development Fund ‘A way to achieve Europe’ (ERDF) and the Extremadura Local Government (Ref. IB16118). It was also partially supported by the computing facilities of Extremadura Research Center for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rico-Gallego, J.A., Díaz-Martín, J.C., Calvo-Jurado, C. et al. Analytical Communication Performance Models as a metric in the partitioning of data-parallel kernels on heterogeneous platforms. J Supercomput 75, 1654–1669 (2019). https://doi.org/10.1007/s11227-018-2724-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2724-8