Performance prediction of parallel applications: a systematic literature review | The Journal of Supercomputing Skip to main content
Log in

Performance prediction of parallel applications: a systematic literature review

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Different techniques for estimating the execution time of parallel applications have been studied for the last 25 years. These approaches have proposed different methods for predicting the performance behaviour of applications. Most of these methods rely on analysing one or more of the following aspects: system workload, application structure, platform system, and the computing resources that the application needs to perform its operations. These elements are used and applied by different methods such as analytic and non-analytic methods. However, no wide-ranging survey of these approaches exists at the time of writing. This paper presents a systematic review of performance prediction methods for parallel applications, which were published in the open literature during the period 2005–2020. We define a classification framework to categorise the reviewed approaches. In addition, we identify some directions and trends in performance prediction as well as some unsolved issues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. This resulted in researchers having more chance to have access to multi-core systems since Intel processors were cheaper and widely deployed. Hence, the amount of performance prediction research focused on multi-core processors increased from the year 2005 as can be seen in answer to RQ3.

  2. The H-index for journals is based on http://www.scimagojr.com; last access: April, 2020.

  3. This term is also known as “quality assessment” [16].

  4. It should be noted that the simulation environments sub-category is part of the platform category since we considered a simulation as an executing platform. This given that a simulation approach is also mapped to a prediction method in our framework (i.e. analytic or non-analytic method).

  5. Below, we define the number of papers that use each of the methods. Some papers include two or more methods as they make a comparison among them or in some cases some methods complement the use of other ones.

  6. A timestep is defined as one step of computation followed by inter-process communication to update the data.

  7. A hybrid system has multi-core processors and a many-core architecture as processing units.

  8. Many-core systems can be seen as an evolution of multi-core systems.

References

  1. Mak VW, Lundstrom SF (1990) Predicting performance of parallel computations. IEEE Trans Parallel Distrib Syst 1(3):257–270. https://doi.org/10.1109/71.80155

    Article  Google Scholar 

  2. Mielke RR, Stoughton JW, Som S (1988) Modeling and performance bounds for concurrent processing. In: 8th International Conference on Distributed Computing Systems, 1988, pp 538–544. https://doi.org/10.1109/DCS.1988.12557

  3. Som S, Mielke RR, Stoughton JW (1993) Prediction of performance and processor requirements in real-time data flow architectures. IEEE Trans Parallel Distrib Syst 4(11):1205–1216. https://doi.org/10.1109/71.250100

    Article  Google Scholar 

  4. Kundu S, Rangaswami R, Dutta K, Zhao M (2010) Application performance modeling in a virtualized environment. In: 2010 IEEE 16th International Symposium on High Performance Computer Architecture (HPCA), pp 1 –10. https://doi.org/10.1109/HPCA.2010.5463058

  5. Oliner A, Ganapathi A, Xu W (2011) Advances and challenges in log analysis: logs contain a wealth of information for help in managing systems. Queue 9(12):30–30:40. https://doi.org/10.1145/2076796.2082137

    Article  Google Scholar 

  6. Zhang Y, Sun W, Inoguchi Y (2008) Predict task running time in grid environments based on CPU load predictions. Future Gener Comput Syst 24(6):489–497. https://doi.org/10.1016/j.future.2007.07.003

    Article  Google Scholar 

  7. Kitchenham B (2004) Procedures for performing systematic reviews. Technical report Keele University and Empirical Software Engineering National ICT Australia Ltd

  8. Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304

    Article  Google Scholar 

  9. Chen L, Ali Babar M, Ali N (2009) Variability management in software product lines: a systematic review. In: Proceedings of the 13th International Software Product Line Conference, SPLC’09, Carnegie Mellon University, Pittsburgh, PA, USA, pp 81–90

  10. Moore SK (2011) Multicore CPUs: processor proliferation. IEEE Spectr 48(1):40–43

    Google Scholar 

  11. Küngas P, Karus S, Vakulenko S, Dumas M, Parra C, Casati F (2013) Reverse-engineering conference rankings: what does it take to make a reputable conference? Scientometrics 96(2):651–665. https://doi.org/10.1007/s11192-012-0938-8

    Article  Google Scholar 

  12. (2010) E. A. of Australasia (CORE), Conference rankings. http://www.core.edu.au/conference-portal. Consulted April 2020

  13. De Silva PUK, Vance CK (2017) Measuring the impact of scientific research. Springer International Publishing, Cham, pp 101–115

    Google Scholar 

  14. Oosthuizen JC, Fenton JE (2014) Alternatives to the impact factor. Surgeon 12(5):239–243. https://doi.org/10.1016/j.surge.2013.08.002

    Article  Google Scholar 

  15. Cánovas Izquierdo JL, Cosentino V, Cabot J (2016) Analysis of co-authorship graphs of CORE-ranked software conferences. Scientometrics 109(3):1665–1693. https://doi.org/10.1007/s11192-016-2136-6

    Article  Google Scholar 

  16. Salleh N, Mendes E, Grundy J (2011) Empirical studies of pair programming for CS/SE teaching in higher education: a systematic literature review. IEEE Trans Softw Eng 37(4):509–525. https://doi.org/10.1109/TSE.2010.59

    Article  Google Scholar 

  17. Shimizu S, Rangaswami R, Duran-Limon HA, Corona-Perez M (2009) Platform-independent modeling and prediction of application resource usage characteristics. J Syst Softw 82(12):2117–2127. https://doi.org/10.1016/j.jss.2009.07.020

    Article  Google Scholar 

  18. Downey AB (1997) A model for speedup of parallel programs. Technical report, USA

    Book  Google Scholar 

  19. Drozdowski M, Wielebski L (2010) Isoefficiency maps for divisible computations. IEEE Trans Parallel Distrib Syst 21(6):872–880. https://doi.org/10.1109/TPDS.2009.128

    Article  Google Scholar 

  20. Grama AY, Gupta A, Kumar V (1993) Isoefficiency: measuring the scalability of parallel algorithms and architectures. IEEE Parallel Distrib Technol Syst Appl 1(3):12–21. https://doi.org/10.1109/88.242438

    Article  Google Scholar 

  21. Collins GW (2003) Fundamental numerical methods and data analysis. http://ads.harvard.edu/books/1990fnmd.book/

  22. Smyth GK (2005) Polynomial approximation. In: Armitage P, Colton T (eds) Encyclopedia of biostatistics. https://doi.org/10.1002/0470011815.b2a14028

  23. Li Y, Ma W (2010) Applications of artificial neural networks in financial economics: a survey. In: 2010 International Symposium on Computational Intelligence and Design (ISCID), vol 1, pp 211–214. https://doi.org/10.1109/ISCID.2010.70

  24. Atkeson CG, Moore AW, Schaal S (1997) Locally weighted learning. Artif Intell Rev 11(1–5):11–73. https://doi.org/10.1023/A:1006559212014

    Article  Google Scholar 

  25. Hein JL (2002) Discrete mathematics, Chap. 10, 2nd edn. Jones and Bartlett Publishers, Inc., Burlington, p 560

  26. Bonate PL (2006) Pharmacokinetic-pharmacodynamic modeling and simulation. Springer, US, New York. https://doi.org/10.1007/b138744

    Article  Google Scholar 

  27. Seber GAF, Wild CJ (2003) Nonlinear regression. Wiley Interscience, Hoboken

    MATH  Google Scholar 

  28. Degomme A, Legrand A, Markomanolis GS, Quinson M, Stillwell M, Suter F (2017) Simulating MPI applications: the SMPI approach. IEEE Trans Parallel Distrib Syst 28(8):2387–2400

    Article  Google Scholar 

  29. Yang LT, Ma X, Mueller F (2005) Cross-platform performance prediction of parallel applications using partial execution. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC’05, IEEE Computer Society, Seattle, WA, USA, p 40. https://doi.org/10.1109/SC.2005.20

  30. Litke A, Tserpes K, Varvarigou T (2005) Computational workload prediction for grid oriented industrial applications: the case of 3D-image rendering. In: IEEE International Symposium on Cluster Computing and the Grid, 2005. CCGrid 2005, vol 2, pp 962–969. https://doi.org/10.1109/CCGRID.2005.1558665

  31. Elmroth E, Tordsson J (2008) Grid resource brokering algorithms enabling advance reservations and resource selection based on performance predictions. Future Gener Comput Syst 24(6):585–593. https://doi.org/10.1016/j.future.2007.06.001

    Article  Google Scholar 

  32. Wu M, Sun X-H (2006) Grid harvest service: a performance system of grid computing. J Parallel Distrib Comput 66(10):1322–1337. https://doi.org/10.1016/j.jpdc.2006.05.008

    Article  MATH  Google Scholar 

  33. Cho Y, Oh S, Egger B (2020) Performance modeling of parallel loops on multi-socket platforms using queueing systems. IEEE Trans Parallel Distrib Syst 31(2):318–331

    Article  Google Scholar 

  34. Bhimani J, Mi N, Leeser M, Yang Z (2019) New performance modeling methods for parallel data processing applications. ACM Trans Model Comput Simul 29(3):1. https://doi.org/10.1145/3309684

    Article  MathSciNet  Google Scholar 

  35. Heinecke A (2013) Accelerators in scientific computing is it worth the effort? In: 2013 International Conference on High Performance Computing and Simulation (HPCS), 2013, p 504. https://doi.org/10.1109/HPCSim.2013.6641460

  36. El-Khamra Y, Gaffney N, Walling D, Wernert E, Xu W, Zhang H (2013) Performance evaluation of R with Intel Xeon Phi coprocessor. In: 2013 IEEE International Conference on Big Data, pp 23–30. https://doi.org/10.1109/BigData.2013.6691695

  37. Heinecke A, Vaidyanathan K, Smelyanskiy M, Kobotov A, Dubtsov R, Henry G, Shet AG, Chrysos G, Dubey P (2013) Design and implementation of the Linpack benchmark for single and multi-node systems based on Intel ® Xeon Phi coprocessor. In: 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), pp 126–137. https://doi.org/10.1109/IPDPS.2013.113

  38. Misra G, Kurkure N, Das A, Valmiki M, Das S, Gupta A (2013) Evaluation of Rodinia codes on Intel Xeon Phi. In: 2013 4th International Conference on Intelligent Systems Modelling Simulation (ISMS), pp 415–419. https://doi.org/10.1109/ISMS.2013.118

  39. Ramachandran A, Vienne J, Van Der Wijngaart R, Koesterke L, Sharapov I (2013) Performance evaluation of NAS parallel benchmarks on Intel Xeon Phi. In: 2013 42nd International Conference on Parallel Processing (ICPP), pp 736–743. https://doi.org/10.1109/ICPP.2013.87

  40. (2019) Top500 list, November 2019 release. www.top500.org

  41. Michalakes J, Dudhia J, Gill D, Henderson T, Klemp J, Skamarock W, Wang W (2005) The weather research and forecast model: software architecture and performance. In: Zwieflhofer W, Mozdzynski G (eds) Use of high performance computing in meteorology. World Scientific, Reading UK, pp 156–168

    Chapter  Google Scholar 

  42. Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76. https://doi.org/10.1145/1498765.1498785

    Article  Google Scholar 

  43. Haghshenas K, Mohammadi S (2020) Prediction-based underutilized and destination host selection approaches for energy-efficient dynamic VM consolidation in data centers. J Supercomput. https://doi.org/10.1007/s11227-020-03248-4

    Article  Google Scholar 

  44. Farahnakian F, Pahikkala T, Liljeberg P, Plosila J, Tenhunen H (2015) Utilization prediction aware VM consolidation approach for green cloud computing. In: 2015 IEEE 8th International Conference on Cloud Computing, pp 381–388

  45. Murugan M, Du DHC, Kant K (2013) On the interconnect energy efficiency of high end computing systems. Sustain Comput Inform Syst 3(2):49–57. https://doi.org/10.1016/j.suscom.2012.03.002

    Article  Google Scholar 

  46. Jarus M, Oleksiak A, Piontek T, Węglarz J (2014) Runtime power usage estimation of HPC servers for various classes of real-life applications. Future Gener Comput Syst 36:299–310. https://doi.org/10.1016/j.future.2013.07.012

    Article  Google Scholar 

  47. Witkowski M, Oleksiak A, Piontek T, Węglarz J (2013) Practical power consumption estimation for real life HPC applications. Future Gener Comput Syst 29(1):208–217. https://doi.org/10.1016/j.future.2012.06.003

    Article  Google Scholar 

  48. Darling A, Carey L, Feng WC (2003) The design, implementation, and evaluation of mpiBLAST. In Proceedings of the ClusterWorld Conference and Expo and the 4th International Conference on Linux Clusters: The HPC Revolution 2003. http://public.lanl.gov/radiant/pubs/bio/cwce03.pdf

  49. Heroux MA (2015) miniFE a finite element mini-application. https://asc.llnl.gov/CORAL-benchmarks/#minife

  50. Andrade X, Strubbe DA, Giovannini UD, Larsen AH, Oliveira MJT, Alberdi-Rodriguez J, Varas A, Theophilou I, Helbig N, Verstraete M, Stella L, Nogueira F, Aspuru-Guzik A, Castro A, Marques MAL, Rubio A (2015) Real-space grids and the Octopus code as tools for the development of new simulation approaches for electronic systems. Phys. Chem. Chem. Phys 17:31371–31396. https://doi.org/10.1039/C5CP00351B

    Article  Google Scholar 

  51. Altenbernd P, Gustafsson J, Lisper B, Stappert F (2016) Early execution time-estimation through automatically generated timing models, Real-Time Systems: The International Journal of Time-Critical. Comput Syst 52(6):731–760

    Google Scholar 

  52. Amaris M, Cordeiro D, Goldman A, Camargo RYd (2015) A simple BSP-based model to predict execution time in GPU applications. In: 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), pp 285–294

  53. Bauer G, Gottlieb S, Hoefler T (2012) Performance modeling and comparative analysis of the MILC lattice QCD application su3_rmd. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp 652–659. https://doi.org/10.1109/CCGrid.2012.123

  54. Boullón M, Cabaleiro JC, Doallo R, González P, Martínez DR, Martín M, Mouriño JC, Pena TF, Rivera F (2005) Modeling execution time of selected computation and communication kernels on grids. In: Sloot PMA, Hoekstra AG, Priol T, Reinefeld A, Bubak M (eds) Advances in grid computing—EGC 2005, volume 3470 of lecture notes in computer science. Springer, Heidelberg, pp 731–740. https://doi.org/10.1007/11508380_74

  55. Calotoiu A, Hoefler T, Poke M, Wolf F (2013) Using automated performance modeling to find scalability bugs in complex codes. In: SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp 1–12. https://doi.org/10.1145/2503210.2503277

  56. Carrington L, Snavely A, Wolter N (2006) A performance prediction framework for scientific applications. Future Gener Comput Syst 22(3):336–346. https://doi.org/10.1016/j.future.2004.11.019

    Article  Google Scholar 

  57. Choi J, Richards DF, Kale LV, Bhatele A (2020) End-to-end performance modeling of distributed GPU applications. In: Proceedings of the 34th ACM International Conference on Supercomputing, pp 1–12

  58. Cornea BF, Bourgeois J (2012) A framework for efficient performance prediction of distributed applications in heterogeneous systems. J Supercomput 62(3):1609–1634. https://doi.org/10.1007/s11227-012-0823-5

    Article  Google Scholar 

  59. Davis JA, Mudalige GR, Hammond SD, Herdman JA, Miller I, Jarvis SA (2011) Predictive analysis of a hydrodynamics application on large-scale CMP clusters. Comput Sci 26(3–4):175–185. https://doi.org/10.1007/s00450-011-0164-2

    Article  Google Scholar 

  60. De Pestel S, Van den Steen S, Akram S, Eeckhout L (2018) RPPM: rapid performance prediction of multithreaded applications on multicore hardware. IEEE Comput Archit Lett 17(2):183–186

    Article  Google Scholar 

  61. Gianni D, Iazeolla G, D’Ambrogio A (2010) A methodology to predict the performance of distributed simulations. In: 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation (PADS), pp 1–9. https://doi.org/10.1109/PADS.2010.5471669

  62. Gualandris A, Zwart SP, Tirado-Ramos A (2007) Performance analysis of direct N-body algorithms for astrophysical simulations on distributed systems. Parallel Comput 33(3):159–173. https://doi.org/10.1016/j.parco.2007.01.001

    Article  MathSciNet  Google Scholar 

  63. Guo P, wei Lee C (2016) A performance prediction and analysis integrated framework for SpMV on GPUs. Procedia Comput Sci 80:178–189. International conference on computational science 2016, ICCS 2016, 6–8 June 2016, San Diego, California, USA. https://doi.org/10.1016/j.procs.2016.05.308

  64. Hammer J, Hager G, Eitzinger J, Wellein G (2015) Automatic loop kernel analysis and performance modeling with Kerncraft. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, PMBS ‘15, Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2832087.2832092

  65. Hudik M, Hodon M (2014) Modeling, optimization and performance prediction of parallel algorithms. In: 2014 IEEE Symposium on Computers and Communication (ISCC), Workshops, pp 1–7. https://doi.org/10.1109/ISCC.2014.6912632

  66. Ivannikov VP, Gaisaryan SS, Avetisyan AI, Padaryan VA (2006) Estimation of dynamical characteristics of a parallel program on a model. Program Comput Softw 32(4):203–214. https://doi.org/10.1134/S0361768806040037

    Article  MATH  Google Scholar 

  67. Jarvis SA, Spooner DP, Keung HNLC, Cao J, Saini S, Nudd GR (2006) Performance prediction and its use in parallel and distributed computing systems. Future Gener Comput Syst 22:745–754. https://doi.org/10.1016/j.future.2006.02.008

    Article  Google Scholar 

  68. Kerbyson DJ, Barker KJ (2011) A performance model of direct numerical simulation for analyzing large-scale systems. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum (IPDPSW), pp 1824–1830. https://doi.org/10.1109/IPDPS.2011.341

  69. Kestor G, Gioiosa R, Chavarrıa-Miranda D (2015) Prometheus: scalable and accurate emulation of task-based applications on many-core systems. In: 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp 308–317. https://doi.org/10.1109/ISPASS.2015.7095816

  70. Lee S, Meredith JS, Vetter JS (2015) COMPASS: a framework for automated performance modeling and prediction. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS’15. ACM, New York, NY, USA, pp 405–414. https://doi.org/10.1145/2751205.2751220

  71. Li D, Xu C, Cheng B, Xiong M, Gao X, Deng X (2017) Performance modeling and optimization of parallel LU-SGS on many-core processors for 3D high-order CFD simulations. J Supercomput 73(6):2506–2524

    Article  Google Scholar 

  72. Midorikawa ET, de Oliveira HM, Laine JM (2005) PEMPIs: a new methodology for modeling and prediction of MPI programs performance. Int J Parallel Prog 33(5):499–527. https://doi.org/10.1007/s10766-005-7303-y

    Article  MATH  Google Scholar 

  73. Mohammed A, Eleliemy A, Ciorba FM, Kasielke F, Banicescu I (2020) An approach for realistically simulating the performance of scientific applications on high performance computing systems. Future Gener Comput Syst 111:617–633

    Article  Google Scholar 

  74. Obaida MA, Liu J, Chennupati G, Santhi N, Eidenbenz S (2018) Parallel application performance prediction using analysis based models and HPC simulations. In: Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, pp 49–59

  75. Panadero J, Wong A, Rexachs D, Luque E (2013) A tool for selecting the right target machine for parallel scientific applications. Procedia Comput Sci 18:1824–1833. https://doi.org/10.1016/j.procs.2013.05.351. 2013 International Conference on Computational Science

    Article  Google Scholar 

  76. Parakh AK, Balakrishnan M, Paul K (2012) Performance estimation of GPUs with cache. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), pp 2384–2393. https://doi.org/10.1109/IPDPSW.2012.328

  77. Sahuquillo J, Hassan H, Petit S, March JL, Duato J (2015) A dynamic execution time estimation model to save energy in heterogeneous multicores running periodic tasks. Future Gener Comput Syst. https://doi.org/10.1016/j.future.2015.06.011

    Article  Google Scholar 

  78. Saussard R, Bouzid B, Vasiliu M, Reynaud R (2015) Optimal performance prediction of ADAS algorithms on embedded parallel architectures. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, pp 213–218

  79. Seneviratne S, Levy DC (2011) Task profiling model for load profile prediction. Future Gener Comput Syst 27(3):245–255. https://doi.org/10.1016/j.future.2010.09.004

    Article  Google Scholar 

  80. Sharkawi S, DeSota D, Panda R, Stevens S, Taylor V, Wu X (2012) SWAPP: a framework for performance projections of HPC applications using benchmarks. In: 2012 IEEE 26th International parallel and distributed processing symposium workshops PhD forum (IPDPSW), pp 1722–1731. https://doi.org/10.1109/IPDPSW.2012.214

  81. Sun E, Kaeli D (2014) Aggressive value prediction on a GPU. Int J Parallel Program 42(1):30–48

    Article  Google Scholar 

  82. Tallent NR, Hoisie A (2014) Palm: Easing the burden of analytical performance modeling. In: Proceedings of the 28th ACM International Conference on Supercomputing, ICS’14. Association for Computing Machinery, New York, NY, USA, pp 221–230 https://doi.org/10.1145/2597652.2597683

  83. Wang K, Khan MMH (2015) Performance prediction for apache spark platform. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, pp 166–173

  84. Wong A, Rexachs D, Luque E (2015) Parallel application signature for performance analysis and prediction. IEEE Trans Parallel Distrib Syst 26(7):2009–2019. https://doi.org/10.1109/TPDS.2014.2329688

    Article  Google Scholar 

  85. Wu J, Yang X, Zhang Z, Chen G, Mao R (2019) A performance model for GPU architectures that considers on-chip resources: Application to medical image registration. IEEE Trans Parallel Distrib Syst 30(9):1947–1961

    Article  Google Scholar 

  86. Yero EJH, Henriques MAA (2006) Contention-sensitive static performance prediction for parallel distributed applications. Perform Eval 63(4):265–277. https://doi.org/10.1016/j.peva.2005.01.008

    Article  Google Scholar 

  87. Zhai J, Chen W, Zheng W, Li K (2016) Performance prediction for large-scale parallel applications using representative replay. IEEE Trans Comput 65:2184–2198

    Article  MathSciNet  Google Scholar 

  88. Achour S, Ammar M, Khmili B, Nasri W (2011) MPI-PERF-SIM: towards an automatic performance prediction tool of MPI programs on hierarchical clusters. In: 2011 19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 207–211. https://doi.org/10.1109/PDP.2011.49

  89. Arndt OJ, Lüders M, Riggers C, Blume H (2020) Multicore performance prediction with MPET. J Signal Process Syst 92(9):981–998

    Article  Google Scholar 

  90. Barnes BJ, Rountree B, Lowenthal DK, Reeves J, de Supinski B, Schulz M (2008) A regression-based approach to scalability prediction. In: Proceedings of the 22nd Annual International Conference on Supercomputing, ICS’08. ACM, New York, NY, USA, pp 368–377. https://doi.org/10.1145/1375527.1375580

  91. Czarnul P, Kuchta J, Matuszek M, Proficz J, Rościszewski P, Wójcik M (2017) Szymański J MERPSYS: an environment for simulation of parallel application execution on large scale HPC systems. Simul Model Pract Theory 77:124–140. https://doi.org/10.1016/j.simpat.2017.05.009

    Article  Google Scholar 

  92. De Sensi D (2016) Predicting performance and power consumption of parallel applications. In: 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pp 200–207

  93. Deshmeh A, Machina J, Sodan A (2010) ADEPT scalability predictor in support of adaptive resource allocation. In: 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp 1–12. https://doi.org/10.1109/IPDPS.2010.5470430

  94. Goldsmith SF, Aiken AS, Wilkerson DS (2007) Measuring empirical computational complexity. In: Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC-FSE’07. Association for Computing Machinery, New York, NY, USA, pp 395–404. https://doi.org/10.1145/1287624.1287681

  95. Happe J, Koziolek H, Reussner R (2007) Parametric performance contracts for software components with concurrent behaviour. Electron Notes Theor Comput Sci 182:91–106. https://doi.org/10.1016/j.entcs.2006.09.033

    Article  Google Scholar 

  96. Huh E-N, Welch LR (2006) Adaptive resource management for dynamic distributed real-time applications. J Supercomput 38(2):127–142. https://doi.org/10.1007/s11227-006-7554-4

    Article  Google Scholar 

  97. Khan M, Jin Y, Li M, Xiang Y, Jiang C (2016) Hadoop performance modeling for job estimation and resource provisioning. IEEE Trans Parallel Distrib Syst 27(2):441–454

    Article  Google Scholar 

  98. Lu G, Zhang W, He H, Yang LT (2019) Performance modeling for MPI applications with low overhead fine-grained profiling. Future Gener Comput Syst 90:317–326

    Article  Google Scholar 

  99. Lobachev O, Guthe M, Loogen R (2013) Estimating parallel performance. J Parallel Distrib Comput 73(6):876–887. https://doi.org/10.1016/j.jpdc.2013.01.011

    Article  Google Scholar 

  100. de Mello RF, Yang LT (2009) Prediction of dynamical, nonlinear, and unstable process behavior. J Supercomput 49(1):22–41. https://doi.org/10.1007/s11227-008-0215-z

    Article  Google Scholar 

  101. Pfeiffer W, Wright NJ (2008) Modeling and predicting application performance on parallel computers using HPC challenge benchmarks. In: IEEE International Symposium on Parallel and Distributed Processing, 2008. IPDPS 2008, pp 1–12. https://doi.org/10.1109/IPDPS.2008.4536278

  102. Sadjadi SM, Shimizu S, Figueroa J, Rangaswami R, Delgado J, Duran H, Collazo-Mojica XJ (2008) A modeling approach for estimating execution time of long-running scientific applications. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp 1–8

  103. Sanjay HA, Vadhiyar S (2008) Performance modeling of parallel applications for grid scheduling. J Parallel Distrib Comput 68(8):1135–1145. https://doi.org/10.1016/j.jpdc.2008.02.006

    Article  MATH  Google Scholar 

  104. Sodhi S, Subhlok J, Xu Q (2008) Performance prediction with skeletons. Clust Comput 11(2):151–165. https://doi.org/10.1007/s10586-007-0039-2

    Article  Google Scholar 

  105. Truchet C, Arbelaez A, Richoux F, Codognet P (2016) Estimating parallel runtimes for randomized algorithms in constraint solving. J Heuristics 22(4):613–648. https://doi.org/10.1007/s10732-015-9292-3

    Article  Google Scholar 

  106. Wu R, Sun J, Chen J (2008) Parallel execution time prediction of the multitask parallel programs. Perform Eval 65(10):701–713. https://doi.org/10.1016/j.peva.2008.04.001

    Article  Google Scholar 

  107. Chen Y, Sun X-H, Wu M (2008) Algorithm-system scalability of heterogeneous computing. J Parallel Distrib Comput 68(11):1403–1412. https://doi.org/10.1016/j.jpdc.2008.06.007

    Article  MATH  Google Scholar 

  108. Zhai J, Chen W, Zheng W (2010) PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP’10. ACM, New York, NY, USA, pp 305–314. https://doi.org/10.1145/1693453.1693493

  109. Marin G, Mellor-Crummey J (2004) Cross-architecture performance predictions for scientific applications using parameterized models. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS’04/Performance’04. ACM, New York, NY, USA, pp 2–13. https://doi.org/10.1145/1005686.1005691

  110. Chtepen M, Claeys FHA, Dhoedt B, De Turck F, Fostier J, Demeester P, Vanrolleghem PA (2012) Online execution time prediction for computationally intensive applications with periodic progress updates. J Supercomput 62(2):768–786

    Article  Google Scholar 

  111. Jayakumar A, Murali P, Vadhiyar S (2015) Matching application signatures for performance predictions using a single execution. In: 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 1161–1170. https://doi.org/10.1109/IPDPS.2015.20

  112. Akay MF, Aci CI, Abut F (2015) Predicting the performance measures of a 2-dimensional message passing multiprocessor architecture by using machine learning methods. Neural Netw World 25:241–265

    Article  Google Scholar 

  113. Amarís M, de Camargo RY, Dyab M, Goldman A, Trystram D (2016) A comparison of GPU execution time prediction using machine learning and analytical modeling. In: 2016 IEEE 15th International Symposium on Network Computing and Applications (NCA), pp 326–333

  114. Dao TT, Kim J, Seo S, Egger B, Lee J (2015) A performance model for GPUs with caches. IEEE Trans Parallel Distrib Syst 26(7):1800–1813

    Article  Google Scholar 

  115. Doan T, Kalita J (2017) Predicting run time of classification algorithms using meta-learning. Int J Mach Learn Cybern 8:1929–1943

    Article  Google Scholar 

  116. Dodonov E, de Mello RF (2010) A novel approach for distributed application scheduling based on prediction of communication events. Future Gener Comput Syst 26(5):740–752. https://doi.org/10.1016/j.future.2009.05.004

    Article  Google Scholar 

  117. Hutter F, Xu L, Hoos HH, Leyton-Brown K (2014) Algorithm runtime prediction: methods & evaluation. Artif Intell 206:79–111. https://doi.org/10.1016/j.artint.2013.10.003

    Article  MathSciNet  MATH  Google Scholar 

  118. Ipek E, de Supinski BR, Schulz M, McKee SA (2005) An approach to performance prediction for parallel applications. In: Cunha JC, Medeiros PD (eds) Euro-par 2005 parallel processing, volume 3648 of lecture notes in computer science. Springer, Berlin, pp 196–205. https://doi.org/10.1007/11549468_24

  119. Li B, Peng L, Ramadass B (2009) Accurate and efficient processor performance prediction via regression tree based modeling. J Syst Archit 55:457–467. https://doi.org/10.1016/j.sysarc.2009.09.004

    Article  Google Scholar 

  120. Ling Y, Liu F, Qiu Y, Zhao J (2016) Prediction of total execution time for MapReduce applications. In: 2016 Sixth International Conference on Information Science and Technology (ICIST), pp 341–345

  121. Oyamada MS, Zschornack F, Wagner FR (2008) Applying neural networks to performance estimation of embedded software. J Syst Archit 54(1–2):224–240. https://doi.org/10.1016/j.sysarc.2007.06.005

    Article  Google Scholar 

  122. Phinjaroenphan P, Bevinakoppa S, Zeephongsekul P (2005) A method for estimating the execution time of a parallel task on a grid node. In: Sloot PMA, Hoekstra AG, Priol T, Reinefeld A, Bubak M (eds) Advances in grid computing—EGC 2005, volume 3470 of lecture notes in computer science. Springer, Berlin, pp 226–236. https://doi.org/10.1007/11508380_24

  123. Prem H, Raghavan NRS (2006) A support vector machine based approach for forecasting of network weather services. J Grid Comput 4(1):89–114. https://doi.org/10.1007/s10723-005-9017-1

    Article  Google Scholar 

  124. Smith W (2007) Prediction services for distributed computing. In: IEEE International Parallel and Distributed Processing Symposium, 2007. IPDPS 2007, pp 1–10. https://doi.org/10.1109/IPDPS.2007.370276

  125. Sun J, Sun G, Zhan S, Zhang J, Chen Y (2020) Automated performance modeling of HPC applications using machine learning. IEEE Trans Comput 69(5):749–763

    Article  Google Scholar 

  126. Zhang W, Hao M, Snir M (2016) Predicting HPC parallel program performance based on LLVM compiler. Clust Comput 20:1179–1192

    Article  Google Scholar 

Download references

Acknowledgements

Jesus Flores-Contreras and Sergio H. Almanza-Ruiz would like to thank the Mexican National Council for Science and Technology (CONACyT) for the full-time scholarship of their postgraduate studies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hector A. Duran-Limon.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Steps to construct the search strings

The following steps were used to construct the search strings:

  1. 1.

    Extract nouns from the research questions.

  2. 2.

    With the extracted nouns, define compound nouns semantically related to the research questions.

  3. 3.

    Group nouns with similar semantics.

  4. 4.

    Reduce as much as possible compound nouns without losing the original semantics.

  5. 5.

    Include synonyms and alternative spellings. In case a compound noun has multiple possible synonyms terms, reduce the compound noun if the semantics is not lost. Otherwise, split the noun and incorporate such nouns into the existing groups of nouns or create new groups if necessary.

  6. 6.

    Define the search string by using the Boolean OR to incorporate all nouns that belong to a group. Then, use the Boolean AND to intersect all groups of nouns.

We extracted nouns from the research questions (step 1) and with these nouns we defined compound nouns semantically related to the research questions (step 2). We obtained the following compound nouns: execution time, parallel application, multiprocessor, prediction method, and performance prediction model.

We then found three groups of nouns in which the elements of a group were similar or related (step 3). Group 1 contained “prediction method” and “performance prediction model” whose terms are nearly related. Group 2 included one compound noun, namely “execution time”. Group 3 was formed by “parallel application” and “multiprocessor”. These two terms are nearly related since both of them are related to running parallel applications. We then proceeded to simplify compound nouns (step 4). “performance prediction model” was reduced to “performance prediction” and “performance model”.

We included synonyms in the groups (step 5). In group 1, we found out that there are multiple terms used as synonyms of the term “performance prediction” in the literature, such as “prediction method”, “resource prediction”, “prediction of the completion time of an executing task”, “estimate the application runtime”, “execution time estimation”, and “performance modelling and prediction”. Therefore, we further reduced this term to “prediction” to avoid omitting a synonym term we might not have considered. We then included “estimation” as a synonym of “prediction”. In group 2, we added “runtime”, “completion time” and “time of execution” as synonyms of “execution time”. Regarding group 3, we discovered that the term “parallel applications” had multiple terms that are synonyms, such as “parallel tasks”, “parallel algorithms” and “parallel systems”. Hence, in order to avoid omitting a synonym term, we further reduced this term to “parallel”. We then included as synonyms of “parallel” the following terms: “distributed”, “grid”, “cluster”, “high performance computing”, “HPC”, “MPI”, and “OpenMP”. As a synonym of “multiprocessor”, we included “multiple-processor”.

The last step in this stage is constructing the search string (step 6). The search string represents the phrase that is used to perform the search in the database engines. The search string was constructed as the intersection of the groups of nouns, whereby within a group a Boolean OR is used. We have that group 1 is G1 = {“performance model”, “prediction”, “estimation”}, group 2 is G2 = {“execution time”, “runtime”, “completion time”, “time of execution”} and group 3 is G3 = {“parallel”, “distributed”, “grid”, “cluster”, “high performance computing”, “HPC”, “MPI”, “OpenMP”, “multiprocessor”, “multiple-processor”}.

We obtained the following search string by intersecting the groups as follows G1 AND G2 AND G3, and by applying a Boolean OR to the elements of each group:

(“performance model” OR prediction OR estimation) AND (“execution time” OR runtime OR “completion time” OR “time of execution”) AND (parallel OR distributed OR grid OR cluster OR “high performance computing” OR HPC OR MPI OR OpenMP OR multiprocessor OR multiple-processor)

We applied the search string to the title, abstract, keywords, and the full text. We obtained a very small amount of papers with search engines that do not support searching in the full text. For instance, the ACM library retrieved only 92 papers. Hence, in this case, we applied G1 AND G3 to title, abstract, and keywords, and then applied G2 to the full text with a PDF editor.

Appendix 2: Number of articles obtained per database

Table 6 shows the number of articles that we obtained from each database. Tables 7, 8, 9, 10, and 11 present the number of papers we found by proceeding and its rank in the CORE ranking as well as the number of papers obtained by journal and its H-index value.

Table 6 Total number of articles by databases
Table 7 Number of articles by conference in the ACM library and their CORE ranking
Table 8 Number of articles by proceeding in the IEEE database and their CORE ranking
Table 9 Number of articles by journal in the IEEE database and their H-index value
Table 10 Number of articles by journal in Science Direct database and their H-index value
Table 11 Number of articles by journal in the Springerlink database and their H-index value

Appendix 3: Reviewed approaches

In this table, the letter “D” denotes the position in the Prediction Domain category, whereas the letter “M” defines the position in the Prediction Methods category (Table 12).

Table 12 Location in the classification framework of the papers considered in the review

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Flores-Contreras, J., Duran-Limon, H.A., Chavoya, A. et al. Performance prediction of parallel applications: a systematic literature review. J Supercomput 77, 4014–4055 (2021). https://doi.org/10.1007/s11227-020-03417-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03417-5

Keywords

Navigation