Analytical Communication Performance Models as a metric in the partitioning of data-parallel kernels on heterogeneous platforms | The Journal of Supercomputing Skip to main content
Log in

Analytical Communication Performance Models as a metric in the partitioning of data-parallel kernels on heterogeneous platforms

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Data partitioning on heterogeneous HPC platforms is formulated as an optimization problem. The algorithm departs from the communication performance models of the processes representing their speeds and outputs a data tiling that minimizes the communication cost. Traditionally, communication volume is the metric used to guide the partitioning, but such metric is unable to capture the complexities introduced by uneven communication channels and the variety of patterns in the kernel communications. We discuss Analytical Communication Performance Models as a new metric in partitioning algorithms. They have not been considered in the past because of two reasons: prediction inaccuracy and lack of tools to automatically build and solve kernel communication formal expressions. We show how communication performance models fit the specific kernel and platform, and we present results that equal or even improve previous volume-based strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Beaumont O, Boudet V, Rastello F, Robert Y (2001) Matrix multiplication on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 12(10):1033–1051

    Article  Google Scholar 

  2. Clarke D, Zhong Z, Rychkov V, Lastovetsky A (2014) FuPerMod: a software tool for the optimization of data-parallel applications on heterogeneous platforms. J Supercomput 69:61–69

    Article  Google Scholar 

  3. Dongarra J, Pineau JF, Robert Y, Vivien F (2008) Matrix product on heterogeneous master-worker platforms. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, New York, NY, USA, PPoPP ’08, pp 53–62

  4. Kalinov A, Lastovetsky A (2001) Heterogeneous distribution of computations solving linear algebra problems on networks of heterogeneous computers. J Parallel Distrib Comput 61(4):520–535

    Article  MATH  Google Scholar 

  5. Lastovetsky A, Reddy R (2010) Distributed data partitioning for heterogeneous processors based on partial estimation of their functional performance models. In: Lin HX, Alexander M, Forsell M, Knüpfer A, Prodan R, Sousa L, Streit A (eds) Euro-Par 2009—parallel processing workshops. Springer, Berlin, pp 91–101

    Google Scholar 

  6. Malik T, Rychkov V, Lastovetsky A (2016) Network-aware optimization of communications for parallel matrix multiplication on hierarchical HPC platforms. Concurr Comput Pract Exp 28:802–821

    Article  Google Scholar 

  7. Rico-Gallego JA, Díaz-Martín JC (2015) \(\tau \)-Lop: modeling performance of shared memory MPI. Parallel Comput 46:14–31

    Article  Google Scholar 

  8. Rico-Gallego JA, Díaz-Martín JC, Lastovetsky AL (2016) Extending \(\tau \)-lop to model concurrent MPI communications in multicore clusters. Future Gener Comput Syst 61:66–82

    Article  Google Scholar 

  9. Rico-Gallego JA, Lastovetsky AL, Díaz-Martín JC (2017) Model-based estimation of the communication cost of hybrid data-parallel applications on heterogeneous clusters. IEEE Trans Parallel Distrib Syst 28(11):3215–3228

    Article  Google Scholar 

  10. van de Geijn RA, Watts J (1995) SUMMA: scalable universal matrix multiplication algorithm. Technical Report, Austin, TX, USA

  11. Zhong Z, Rychkov V, Lastovetsky A (2015) Data partitioning on multicore and multi-GPU platforms using functional performance models. IEEE Trans Comput 64:2506–2518

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the European Regional Development Fund ‘A way to achieve Europe’ (ERDF) and the Extremadura Local Government (Ref. IB16118). It was also partially supported by the computing facilities of Extremadura Research Center for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan A. Rico-Gallego.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rico-Gallego, J.A., Díaz-Martín, J.C., Calvo-Jurado, C. et al. Analytical Communication Performance Models as a metric in the partitioning of data-parallel kernels on heterogeneous platforms. J Supercomput 75, 1654–1669 (2019). https://doi.org/10.1007/s11227-018-2724-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2724-8

Keywords

Navigation