Abstract
Different techniques for estimating the execution time of parallel applications have been studied for the last 25 years. These approaches have proposed different methods for predicting the performance behaviour of applications. Most of these methods rely on analysing one or more of the following aspects: system workload, application structure, platform system, and the computing resources that the application needs to perform its operations. These elements are used and applied by different methods such as analytic and non-analytic methods. However, no wide-ranging survey of these approaches exists at the time of writing. This paper presents a systematic review of performance prediction methods for parallel applications, which were published in the open literature during the period 2005–2020. We define a classification framework to categorise the reviewed approaches. In addition, we identify some directions and trends in performance prediction as well as some unsolved issues.
Similar content being viewed by others
Notes
This resulted in researchers having more chance to have access to multi-core systems since Intel processors were cheaper and widely deployed. Hence, the amount of performance prediction research focused on multi-core processors increased from the year 2005 as can be seen in answer to RQ3.
The H-index for journals is based on http://www.scimagojr.com; last access: April, 2020.
This term is also known as “quality assessment” [16].
It should be noted that the simulation environments sub-category is part of the platform category since we considered a simulation as an executing platform. This given that a simulation approach is also mapped to a prediction method in our framework (i.e. analytic or non-analytic method).
Below, we define the number of papers that use each of the methods. Some papers include two or more methods as they make a comparison among them or in some cases some methods complement the use of other ones.
A timestep is defined as one step of computation followed by inter-process communication to update the data.
A hybrid system has multi-core processors and a many-core architecture as processing units.
Many-core systems can be seen as an evolution of multi-core systems.
References
Mak VW, Lundstrom SF (1990) Predicting performance of parallel computations. IEEE Trans Parallel Distrib Syst 1(3):257–270. https://doi.org/10.1109/71.80155
Mielke RR, Stoughton JW, Som S (1988) Modeling and performance bounds for concurrent processing. In: 8th International Conference on Distributed Computing Systems, 1988, pp 538–544. https://doi.org/10.1109/DCS.1988.12557
Som S, Mielke RR, Stoughton JW (1993) Prediction of performance and processor requirements in real-time data flow architectures. IEEE Trans Parallel Distrib Syst 4(11):1205–1216. https://doi.org/10.1109/71.250100
Kundu S, Rangaswami R, Dutta K, Zhao M (2010) Application performance modeling in a virtualized environment. In: 2010 IEEE 16th International Symposium on High Performance Computer Architecture (HPCA), pp 1 –10. https://doi.org/10.1109/HPCA.2010.5463058
Oliner A, Ganapathi A, Xu W (2011) Advances and challenges in log analysis: logs contain a wealth of information for help in managing systems. Queue 9(12):30–30:40. https://doi.org/10.1145/2076796.2082137
Zhang Y, Sun W, Inoguchi Y (2008) Predict task running time in grid environments based on CPU load predictions. Future Gener Comput Syst 24(6):489–497. https://doi.org/10.1016/j.future.2007.07.003
Kitchenham B (2004) Procedures for performing systematic reviews. Technical report Keele University and Empirical Software Engineering National ICT Australia Ltd
Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304
Chen L, Ali Babar M, Ali N (2009) Variability management in software product lines: a systematic review. In: Proceedings of the 13th International Software Product Line Conference, SPLC’09, Carnegie Mellon University, Pittsburgh, PA, USA, pp 81–90
Moore SK (2011) Multicore CPUs: processor proliferation. IEEE Spectr 48(1):40–43
Küngas P, Karus S, Vakulenko S, Dumas M, Parra C, Casati F (2013) Reverse-engineering conference rankings: what does it take to make a reputable conference? Scientometrics 96(2):651–665. https://doi.org/10.1007/s11192-012-0938-8
(2010) E. A. of Australasia (CORE), Conference rankings. http://www.core.edu.au/conference-portal. Consulted April 2020
De Silva PUK, Vance CK (2017) Measuring the impact of scientific research. Springer International Publishing, Cham, pp 101–115
Oosthuizen JC, Fenton JE (2014) Alternatives to the impact factor. Surgeon 12(5):239–243. https://doi.org/10.1016/j.surge.2013.08.002
Cánovas Izquierdo JL, Cosentino V, Cabot J (2016) Analysis of co-authorship graphs of CORE-ranked software conferences. Scientometrics 109(3):1665–1693. https://doi.org/10.1007/s11192-016-2136-6
Salleh N, Mendes E, Grundy J (2011) Empirical studies of pair programming for CS/SE teaching in higher education: a systematic literature review. IEEE Trans Softw Eng 37(4):509–525. https://doi.org/10.1109/TSE.2010.59
Shimizu S, Rangaswami R, Duran-Limon HA, Corona-Perez M (2009) Platform-independent modeling and prediction of application resource usage characteristics. J Syst Softw 82(12):2117–2127. https://doi.org/10.1016/j.jss.2009.07.020
Downey AB (1997) A model for speedup of parallel programs. Technical report, USA
Drozdowski M, Wielebski L (2010) Isoefficiency maps for divisible computations. IEEE Trans Parallel Distrib Syst 21(6):872–880. https://doi.org/10.1109/TPDS.2009.128
Grama AY, Gupta A, Kumar V (1993) Isoefficiency: measuring the scalability of parallel algorithms and architectures. IEEE Parallel Distrib Technol Syst Appl 1(3):12–21. https://doi.org/10.1109/88.242438
Collins GW (2003) Fundamental numerical methods and data analysis. http://ads.harvard.edu/books/1990fnmd.book/
Smyth GK (2005) Polynomial approximation. In: Armitage P, Colton T (eds) Encyclopedia of biostatistics. https://doi.org/10.1002/0470011815.b2a14028
Li Y, Ma W (2010) Applications of artificial neural networks in financial economics: a survey. In: 2010 International Symposium on Computational Intelligence and Design (ISCID), vol 1, pp 211–214. https://doi.org/10.1109/ISCID.2010.70
Atkeson CG, Moore AW, Schaal S (1997) Locally weighted learning. Artif Intell Rev 11(1–5):11–73. https://doi.org/10.1023/A:1006559212014
Hein JL (2002) Discrete mathematics, Chap. 10, 2nd edn. Jones and Bartlett Publishers, Inc., Burlington, p 560
Bonate PL (2006) Pharmacokinetic-pharmacodynamic modeling and simulation. Springer, US, New York. https://doi.org/10.1007/b138744
Seber GAF, Wild CJ (2003) Nonlinear regression. Wiley Interscience, Hoboken
Degomme A, Legrand A, Markomanolis GS, Quinson M, Stillwell M, Suter F (2017) Simulating MPI applications: the SMPI approach. IEEE Trans Parallel Distrib Syst 28(8):2387–2400
Yang LT, Ma X, Mueller F (2005) Cross-platform performance prediction of parallel applications using partial execution. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC’05, IEEE Computer Society, Seattle, WA, USA, p 40. https://doi.org/10.1109/SC.2005.20
Litke A, Tserpes K, Varvarigou T (2005) Computational workload prediction for grid oriented industrial applications: the case of 3D-image rendering. In: IEEE International Symposium on Cluster Computing and the Grid, 2005. CCGrid 2005, vol 2, pp 962–969. https://doi.org/10.1109/CCGRID.2005.1558665
Elmroth E, Tordsson J (2008) Grid resource brokering algorithms enabling advance reservations and resource selection based on performance predictions. Future Gener Comput Syst 24(6):585–593. https://doi.org/10.1016/j.future.2007.06.001
Wu M, Sun X-H (2006) Grid harvest service: a performance system of grid computing. J Parallel Distrib Comput 66(10):1322–1337. https://doi.org/10.1016/j.jpdc.2006.05.008
Cho Y, Oh S, Egger B (2020) Performance modeling of parallel loops on multi-socket platforms using queueing systems. IEEE Trans Parallel Distrib Syst 31(2):318–331
Bhimani J, Mi N, Leeser M, Yang Z (2019) New performance modeling methods for parallel data processing applications. ACM Trans Model Comput Simul 29(3):1. https://doi.org/10.1145/3309684
Heinecke A (2013) Accelerators in scientific computing is it worth the effort? In: 2013 International Conference on High Performance Computing and Simulation (HPCS), 2013, p 504. https://doi.org/10.1109/HPCSim.2013.6641460
El-Khamra Y, Gaffney N, Walling D, Wernert E, Xu W, Zhang H (2013) Performance evaluation of R with Intel Xeon Phi coprocessor. In: 2013 IEEE International Conference on Big Data, pp 23–30. https://doi.org/10.1109/BigData.2013.6691695
Heinecke A, Vaidyanathan K, Smelyanskiy M, Kobotov A, Dubtsov R, Henry G, Shet AG, Chrysos G, Dubey P (2013) Design and implementation of the Linpack benchmark for single and multi-node systems based on Intel ® Xeon Phi coprocessor. In: 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), pp 126–137. https://doi.org/10.1109/IPDPS.2013.113
Misra G, Kurkure N, Das A, Valmiki M, Das S, Gupta A (2013) Evaluation of Rodinia codes on Intel Xeon Phi. In: 2013 4th International Conference on Intelligent Systems Modelling Simulation (ISMS), pp 415–419. https://doi.org/10.1109/ISMS.2013.118
Ramachandran A, Vienne J, Van Der Wijngaart R, Koesterke L, Sharapov I (2013) Performance evaluation of NAS parallel benchmarks on Intel Xeon Phi. In: 2013 42nd International Conference on Parallel Processing (ICPP), pp 736–743. https://doi.org/10.1109/ICPP.2013.87
(2019) Top500 list, November 2019 release. www.top500.org
Michalakes J, Dudhia J, Gill D, Henderson T, Klemp J, Skamarock W, Wang W (2005) The weather research and forecast model: software architecture and performance. In: Zwieflhofer W, Mozdzynski G (eds) Use of high performance computing in meteorology. World Scientific, Reading UK, pp 156–168
Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76. https://doi.org/10.1145/1498765.1498785
Haghshenas K, Mohammadi S (2020) Prediction-based underutilized and destination host selection approaches for energy-efficient dynamic VM consolidation in data centers. J Supercomput. https://doi.org/10.1007/s11227-020-03248-4
Farahnakian F, Pahikkala T, Liljeberg P, Plosila J, Tenhunen H (2015) Utilization prediction aware VM consolidation approach for green cloud computing. In: 2015 IEEE 8th International Conference on Cloud Computing, pp 381–388
Murugan M, Du DHC, Kant K (2013) On the interconnect energy efficiency of high end computing systems. Sustain Comput Inform Syst 3(2):49–57. https://doi.org/10.1016/j.suscom.2012.03.002
Jarus M, Oleksiak A, Piontek T, Węglarz J (2014) Runtime power usage estimation of HPC servers for various classes of real-life applications. Future Gener Comput Syst 36:299–310. https://doi.org/10.1016/j.future.2013.07.012
Witkowski M, Oleksiak A, Piontek T, Węglarz J (2013) Practical power consumption estimation for real life HPC applications. Future Gener Comput Syst 29(1):208–217. https://doi.org/10.1016/j.future.2012.06.003
Darling A, Carey L, Feng WC (2003) The design, implementation, and evaluation of mpiBLAST. In Proceedings of the ClusterWorld Conference and Expo and the 4th International Conference on Linux Clusters: The HPC Revolution 2003. http://public.lanl.gov/radiant/pubs/bio/cwce03.pdf
Heroux MA (2015) miniFE a finite element mini-application. https://asc.llnl.gov/CORAL-benchmarks/#minife
Andrade X, Strubbe DA, Giovannini UD, Larsen AH, Oliveira MJT, Alberdi-Rodriguez J, Varas A, Theophilou I, Helbig N, Verstraete M, Stella L, Nogueira F, Aspuru-Guzik A, Castro A, Marques MAL, Rubio A (2015) Real-space grids and the Octopus code as tools for the development of new simulation approaches for electronic systems. Phys. Chem. Chem. Phys 17:31371–31396. https://doi.org/10.1039/C5CP00351B
Altenbernd P, Gustafsson J, Lisper B, Stappert F (2016) Early execution time-estimation through automatically generated timing models, Real-Time Systems: The International Journal of Time-Critical. Comput Syst 52(6):731–760
Amaris M, Cordeiro D, Goldman A, Camargo RYd (2015) A simple BSP-based model to predict execution time in GPU applications. In: 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), pp 285–294
Bauer G, Gottlieb S, Hoefler T (2012) Performance modeling and comparative analysis of the MILC lattice QCD application su3_rmd. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp 652–659. https://doi.org/10.1109/CCGrid.2012.123
Boullón M, Cabaleiro JC, Doallo R, González P, Martínez DR, Martín M, Mouriño JC, Pena TF, Rivera F (2005) Modeling execution time of selected computation and communication kernels on grids. In: Sloot PMA, Hoekstra AG, Priol T, Reinefeld A, Bubak M (eds) Advances in grid computing—EGC 2005, volume 3470 of lecture notes in computer science. Springer, Heidelberg, pp 731–740. https://doi.org/10.1007/11508380_74
Calotoiu A, Hoefler T, Poke M, Wolf F (2013) Using automated performance modeling to find scalability bugs in complex codes. In: SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp 1–12. https://doi.org/10.1145/2503210.2503277
Carrington L, Snavely A, Wolter N (2006) A performance prediction framework for scientific applications. Future Gener Comput Syst 22(3):336–346. https://doi.org/10.1016/j.future.2004.11.019
Choi J, Richards DF, Kale LV, Bhatele A (2020) End-to-end performance modeling of distributed GPU applications. In: Proceedings of the 34th ACM International Conference on Supercomputing, pp 1–12
Cornea BF, Bourgeois J (2012) A framework for efficient performance prediction of distributed applications in heterogeneous systems. J Supercomput 62(3):1609–1634. https://doi.org/10.1007/s11227-012-0823-5
Davis JA, Mudalige GR, Hammond SD, Herdman JA, Miller I, Jarvis SA (2011) Predictive analysis of a hydrodynamics application on large-scale CMP clusters. Comput Sci 26(3–4):175–185. https://doi.org/10.1007/s00450-011-0164-2
De Pestel S, Van den Steen S, Akram S, Eeckhout L (2018) RPPM: rapid performance prediction of multithreaded applications on multicore hardware. IEEE Comput Archit Lett 17(2):183–186
Gianni D, Iazeolla G, D’Ambrogio A (2010) A methodology to predict the performance of distributed simulations. In: 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation (PADS), pp 1–9. https://doi.org/10.1109/PADS.2010.5471669
Gualandris A, Zwart SP, Tirado-Ramos A (2007) Performance analysis of direct N-body algorithms for astrophysical simulations on distributed systems. Parallel Comput 33(3):159–173. https://doi.org/10.1016/j.parco.2007.01.001
Guo P, wei Lee C (2016) A performance prediction and analysis integrated framework for SpMV on GPUs. Procedia Comput Sci 80:178–189. International conference on computational science 2016, ICCS 2016, 6–8 June 2016, San Diego, California, USA. https://doi.org/10.1016/j.procs.2016.05.308
Hammer J, Hager G, Eitzinger J, Wellein G (2015) Automatic loop kernel analysis and performance modeling with Kerncraft. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, PMBS ‘15, Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2832087.2832092
Hudik M, Hodon M (2014) Modeling, optimization and performance prediction of parallel algorithms. In: 2014 IEEE Symposium on Computers and Communication (ISCC), Workshops, pp 1–7. https://doi.org/10.1109/ISCC.2014.6912632
Ivannikov VP, Gaisaryan SS, Avetisyan AI, Padaryan VA (2006) Estimation of dynamical characteristics of a parallel program on a model. Program Comput Softw 32(4):203–214. https://doi.org/10.1134/S0361768806040037
Jarvis SA, Spooner DP, Keung HNLC, Cao J, Saini S, Nudd GR (2006) Performance prediction and its use in parallel and distributed computing systems. Future Gener Comput Syst 22:745–754. https://doi.org/10.1016/j.future.2006.02.008
Kerbyson DJ, Barker KJ (2011) A performance model of direct numerical simulation for analyzing large-scale systems. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum (IPDPSW), pp 1824–1830. https://doi.org/10.1109/IPDPS.2011.341
Kestor G, Gioiosa R, Chavarrıa-Miranda D (2015) Prometheus: scalable and accurate emulation of task-based applications on many-core systems. In: 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp 308–317. https://doi.org/10.1109/ISPASS.2015.7095816
Lee S, Meredith JS, Vetter JS (2015) COMPASS: a framework for automated performance modeling and prediction. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS’15. ACM, New York, NY, USA, pp 405–414. https://doi.org/10.1145/2751205.2751220
Li D, Xu C, Cheng B, Xiong M, Gao X, Deng X (2017) Performance modeling and optimization of parallel LU-SGS on many-core processors for 3D high-order CFD simulations. J Supercomput 73(6):2506–2524
Midorikawa ET, de Oliveira HM, Laine JM (2005) PEMPIs: a new methodology for modeling and prediction of MPI programs performance. Int J Parallel Prog 33(5):499–527. https://doi.org/10.1007/s10766-005-7303-y
Mohammed A, Eleliemy A, Ciorba FM, Kasielke F, Banicescu I (2020) An approach for realistically simulating the performance of scientific applications on high performance computing systems. Future Gener Comput Syst 111:617–633
Obaida MA, Liu J, Chennupati G, Santhi N, Eidenbenz S (2018) Parallel application performance prediction using analysis based models and HPC simulations. In: Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, pp 49–59
Panadero J, Wong A, Rexachs D, Luque E (2013) A tool for selecting the right target machine for parallel scientific applications. Procedia Comput Sci 18:1824–1833. https://doi.org/10.1016/j.procs.2013.05.351. 2013 International Conference on Computational Science
Parakh AK, Balakrishnan M, Paul K (2012) Performance estimation of GPUs with cache. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), pp 2384–2393. https://doi.org/10.1109/IPDPSW.2012.328
Sahuquillo J, Hassan H, Petit S, March JL, Duato J (2015) A dynamic execution time estimation model to save energy in heterogeneous multicores running periodic tasks. Future Gener Comput Syst. https://doi.org/10.1016/j.future.2015.06.011
Saussard R, Bouzid B, Vasiliu M, Reynaud R (2015) Optimal performance prediction of ADAS algorithms on embedded parallel architectures. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, pp 213–218
Seneviratne S, Levy DC (2011) Task profiling model for load profile prediction. Future Gener Comput Syst 27(3):245–255. https://doi.org/10.1016/j.future.2010.09.004
Sharkawi S, DeSota D, Panda R, Stevens S, Taylor V, Wu X (2012) SWAPP: a framework for performance projections of HPC applications using benchmarks. In: 2012 IEEE 26th International parallel and distributed processing symposium workshops PhD forum (IPDPSW), pp 1722–1731. https://doi.org/10.1109/IPDPSW.2012.214
Sun E, Kaeli D (2014) Aggressive value prediction on a GPU. Int J Parallel Program 42(1):30–48
Tallent NR, Hoisie A (2014) Palm: Easing the burden of analytical performance modeling. In: Proceedings of the 28th ACM International Conference on Supercomputing, ICS’14. Association for Computing Machinery, New York, NY, USA, pp 221–230 https://doi.org/10.1145/2597652.2597683
Wang K, Khan MMH (2015) Performance prediction for apache spark platform. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, pp 166–173
Wong A, Rexachs D, Luque E (2015) Parallel application signature for performance analysis and prediction. IEEE Trans Parallel Distrib Syst 26(7):2009–2019. https://doi.org/10.1109/TPDS.2014.2329688
Wu J, Yang X, Zhang Z, Chen G, Mao R (2019) A performance model for GPU architectures that considers on-chip resources: Application to medical image registration. IEEE Trans Parallel Distrib Syst 30(9):1947–1961
Yero EJH, Henriques MAA (2006) Contention-sensitive static performance prediction for parallel distributed applications. Perform Eval 63(4):265–277. https://doi.org/10.1016/j.peva.2005.01.008
Zhai J, Chen W, Zheng W, Li K (2016) Performance prediction for large-scale parallel applications using representative replay. IEEE Trans Comput 65:2184–2198
Achour S, Ammar M, Khmili B, Nasri W (2011) MPI-PERF-SIM: towards an automatic performance prediction tool of MPI programs on hierarchical clusters. In: 2011 19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 207–211. https://doi.org/10.1109/PDP.2011.49
Arndt OJ, Lüders M, Riggers C, Blume H (2020) Multicore performance prediction with MPET. J Signal Process Syst 92(9):981–998
Barnes BJ, Rountree B, Lowenthal DK, Reeves J, de Supinski B, Schulz M (2008) A regression-based approach to scalability prediction. In: Proceedings of the 22nd Annual International Conference on Supercomputing, ICS’08. ACM, New York, NY, USA, pp 368–377. https://doi.org/10.1145/1375527.1375580
Czarnul P, Kuchta J, Matuszek M, Proficz J, Rościszewski P, Wójcik M (2017) Szymański J MERPSYS: an environment for simulation of parallel application execution on large scale HPC systems. Simul Model Pract Theory 77:124–140. https://doi.org/10.1016/j.simpat.2017.05.009
De Sensi D (2016) Predicting performance and power consumption of parallel applications. In: 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pp 200–207
Deshmeh A, Machina J, Sodan A (2010) ADEPT scalability predictor in support of adaptive resource allocation. In: 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp 1–12. https://doi.org/10.1109/IPDPS.2010.5470430
Goldsmith SF, Aiken AS, Wilkerson DS (2007) Measuring empirical computational complexity. In: Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC-FSE’07. Association for Computing Machinery, New York, NY, USA, pp 395–404. https://doi.org/10.1145/1287624.1287681
Happe J, Koziolek H, Reussner R (2007) Parametric performance contracts for software components with concurrent behaviour. Electron Notes Theor Comput Sci 182:91–106. https://doi.org/10.1016/j.entcs.2006.09.033
Huh E-N, Welch LR (2006) Adaptive resource management for dynamic distributed real-time applications. J Supercomput 38(2):127–142. https://doi.org/10.1007/s11227-006-7554-4
Khan M, Jin Y, Li M, Xiang Y, Jiang C (2016) Hadoop performance modeling for job estimation and resource provisioning. IEEE Trans Parallel Distrib Syst 27(2):441–454
Lu G, Zhang W, He H, Yang LT (2019) Performance modeling for MPI applications with low overhead fine-grained profiling. Future Gener Comput Syst 90:317–326
Lobachev O, Guthe M, Loogen R (2013) Estimating parallel performance. J Parallel Distrib Comput 73(6):876–887. https://doi.org/10.1016/j.jpdc.2013.01.011
de Mello RF, Yang LT (2009) Prediction of dynamical, nonlinear, and unstable process behavior. J Supercomput 49(1):22–41. https://doi.org/10.1007/s11227-008-0215-z
Pfeiffer W, Wright NJ (2008) Modeling and predicting application performance on parallel computers using HPC challenge benchmarks. In: IEEE International Symposium on Parallel and Distributed Processing, 2008. IPDPS 2008, pp 1–12. https://doi.org/10.1109/IPDPS.2008.4536278
Sadjadi SM, Shimizu S, Figueroa J, Rangaswami R, Delgado J, Duran H, Collazo-Mojica XJ (2008) A modeling approach for estimating execution time of long-running scientific applications. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp 1–8
Sanjay HA, Vadhiyar S (2008) Performance modeling of parallel applications for grid scheduling. J Parallel Distrib Comput 68(8):1135–1145. https://doi.org/10.1016/j.jpdc.2008.02.006
Sodhi S, Subhlok J, Xu Q (2008) Performance prediction with skeletons. Clust Comput 11(2):151–165. https://doi.org/10.1007/s10586-007-0039-2
Truchet C, Arbelaez A, Richoux F, Codognet P (2016) Estimating parallel runtimes for randomized algorithms in constraint solving. J Heuristics 22(4):613–648. https://doi.org/10.1007/s10732-015-9292-3
Wu R, Sun J, Chen J (2008) Parallel execution time prediction of the multitask parallel programs. Perform Eval 65(10):701–713. https://doi.org/10.1016/j.peva.2008.04.001
Chen Y, Sun X-H, Wu M (2008) Algorithm-system scalability of heterogeneous computing. J Parallel Distrib Comput 68(11):1403–1412. https://doi.org/10.1016/j.jpdc.2008.06.007
Zhai J, Chen W, Zheng W (2010) PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP’10. ACM, New York, NY, USA, pp 305–314. https://doi.org/10.1145/1693453.1693493
Marin G, Mellor-Crummey J (2004) Cross-architecture performance predictions for scientific applications using parameterized models. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS’04/Performance’04. ACM, New York, NY, USA, pp 2–13. https://doi.org/10.1145/1005686.1005691
Chtepen M, Claeys FHA, Dhoedt B, De Turck F, Fostier J, Demeester P, Vanrolleghem PA (2012) Online execution time prediction for computationally intensive applications with periodic progress updates. J Supercomput 62(2):768–786
Jayakumar A, Murali P, Vadhiyar S (2015) Matching application signatures for performance predictions using a single execution. In: 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 1161–1170. https://doi.org/10.1109/IPDPS.2015.20
Akay MF, Aci CI, Abut F (2015) Predicting the performance measures of a 2-dimensional message passing multiprocessor architecture by using machine learning methods. Neural Netw World 25:241–265
Amarís M, de Camargo RY, Dyab M, Goldman A, Trystram D (2016) A comparison of GPU execution time prediction using machine learning and analytical modeling. In: 2016 IEEE 15th International Symposium on Network Computing and Applications (NCA), pp 326–333
Dao TT, Kim J, Seo S, Egger B, Lee J (2015) A performance model for GPUs with caches. IEEE Trans Parallel Distrib Syst 26(7):1800–1813
Doan T, Kalita J (2017) Predicting run time of classification algorithms using meta-learning. Int J Mach Learn Cybern 8:1929–1943
Dodonov E, de Mello RF (2010) A novel approach for distributed application scheduling based on prediction of communication events. Future Gener Comput Syst 26(5):740–752. https://doi.org/10.1016/j.future.2009.05.004
Hutter F, Xu L, Hoos HH, Leyton-Brown K (2014) Algorithm runtime prediction: methods & evaluation. Artif Intell 206:79–111. https://doi.org/10.1016/j.artint.2013.10.003
Ipek E, de Supinski BR, Schulz M, McKee SA (2005) An approach to performance prediction for parallel applications. In: Cunha JC, Medeiros PD (eds) Euro-par 2005 parallel processing, volume 3648 of lecture notes in computer science. Springer, Berlin, pp 196–205. https://doi.org/10.1007/11549468_24
Li B, Peng L, Ramadass B (2009) Accurate and efficient processor performance prediction via regression tree based modeling. J Syst Archit 55:457–467. https://doi.org/10.1016/j.sysarc.2009.09.004
Ling Y, Liu F, Qiu Y, Zhao J (2016) Prediction of total execution time for MapReduce applications. In: 2016 Sixth International Conference on Information Science and Technology (ICIST), pp 341–345
Oyamada MS, Zschornack F, Wagner FR (2008) Applying neural networks to performance estimation of embedded software. J Syst Archit 54(1–2):224–240. https://doi.org/10.1016/j.sysarc.2007.06.005
Phinjaroenphan P, Bevinakoppa S, Zeephongsekul P (2005) A method for estimating the execution time of a parallel task on a grid node. In: Sloot PMA, Hoekstra AG, Priol T, Reinefeld A, Bubak M (eds) Advances in grid computing—EGC 2005, volume 3470 of lecture notes in computer science. Springer, Berlin, pp 226–236. https://doi.org/10.1007/11508380_24
Prem H, Raghavan NRS (2006) A support vector machine based approach for forecasting of network weather services. J Grid Comput 4(1):89–114. https://doi.org/10.1007/s10723-005-9017-1
Smith W (2007) Prediction services for distributed computing. In: IEEE International Parallel and Distributed Processing Symposium, 2007. IPDPS 2007, pp 1–10. https://doi.org/10.1109/IPDPS.2007.370276
Sun J, Sun G, Zhan S, Zhang J, Chen Y (2020) Automated performance modeling of HPC applications using machine learning. IEEE Trans Comput 69(5):749–763
Zhang W, Hao M, Snir M (2016) Predicting HPC parallel program performance based on LLVM compiler. Clust Comput 20:1179–1192
Acknowledgements
Jesus Flores-Contreras and Sergio H. Almanza-Ruiz would like to thank the Mexican National Council for Science and Technology (CONACyT) for the full-time scholarship of their postgraduate studies.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Steps to construct the search strings
The following steps were used to construct the search strings:
-
1.
Extract nouns from the research questions.
-
2.
With the extracted nouns, define compound nouns semantically related to the research questions.
-
3.
Group nouns with similar semantics.
-
4.
Reduce as much as possible compound nouns without losing the original semantics.
-
5.
Include synonyms and alternative spellings. In case a compound noun has multiple possible synonyms terms, reduce the compound noun if the semantics is not lost. Otherwise, split the noun and incorporate such nouns into the existing groups of nouns or create new groups if necessary.
-
6.
Define the search string by using the Boolean OR to incorporate all nouns that belong to a group. Then, use the Boolean AND to intersect all groups of nouns.
We extracted nouns from the research questions (step 1) and with these nouns we defined compound nouns semantically related to the research questions (step 2). We obtained the following compound nouns: execution time, parallel application, multiprocessor, prediction method, and performance prediction model.
We then found three groups of nouns in which the elements of a group were similar or related (step 3). Group 1 contained “prediction method” and “performance prediction model” whose terms are nearly related. Group 2 included one compound noun, namely “execution time”. Group 3 was formed by “parallel application” and “multiprocessor”. These two terms are nearly related since both of them are related to running parallel applications. We then proceeded to simplify compound nouns (step 4). “performance prediction model” was reduced to “performance prediction” and “performance model”.
We included synonyms in the groups (step 5). In group 1, we found out that there are multiple terms used as synonyms of the term “performance prediction” in the literature, such as “prediction method”, “resource prediction”, “prediction of the completion time of an executing task”, “estimate the application runtime”, “execution time estimation”, and “performance modelling and prediction”. Therefore, we further reduced this term to “prediction” to avoid omitting a synonym term we might not have considered. We then included “estimation” as a synonym of “prediction”. In group 2, we added “runtime”, “completion time” and “time of execution” as synonyms of “execution time”. Regarding group 3, we discovered that the term “parallel applications” had multiple terms that are synonyms, such as “parallel tasks”, “parallel algorithms” and “parallel systems”. Hence, in order to avoid omitting a synonym term, we further reduced this term to “parallel”. We then included as synonyms of “parallel” the following terms: “distributed”, “grid”, “cluster”, “high performance computing”, “HPC”, “MPI”, and “OpenMP”. As a synonym of “multiprocessor”, we included “multiple-processor”.
The last step in this stage is constructing the search string (step 6). The search string represents the phrase that is used to perform the search in the database engines. The search string was constructed as the intersection of the groups of nouns, whereby within a group a Boolean OR is used. We have that group 1 is G1 = {“performance model”, “prediction”, “estimation”}, group 2 is G2 = {“execution time”, “runtime”, “completion time”, “time of execution”} and group 3 is G3 = {“parallel”, “distributed”, “grid”, “cluster”, “high performance computing”, “HPC”, “MPI”, “OpenMP”, “multiprocessor”, “multiple-processor”}.
We obtained the following search string by intersecting the groups as follows G1 AND G2 AND G3, and by applying a Boolean OR to the elements of each group:
(“performance model” OR prediction OR estimation) AND (“execution time” OR runtime OR “completion time” OR “time of execution”) AND (parallel OR distributed OR grid OR cluster OR “high performance computing” OR HPC OR MPI OR OpenMP OR multiprocessor OR multiple-processor)
We applied the search string to the title, abstract, keywords, and the full text. We obtained a very small amount of papers with search engines that do not support searching in the full text. For instance, the ACM library retrieved only 92 papers. Hence, in this case, we applied G1 AND G3 to title, abstract, and keywords, and then applied G2 to the full text with a PDF editor.
Appendix 2: Number of articles obtained per database
Table 6 shows the number of articles that we obtained from each database. Tables 7, 8, 9, 10, and 11 present the number of papers we found by proceeding and its rank in the CORE ranking as well as the number of papers obtained by journal and its H-index value.
Appendix 3: Reviewed approaches
In this table, the letter “D” denotes the position in the Prediction Domain category, whereas the letter “M” defines the position in the Prediction Methods category (Table 12).
Rights and permissions
About this article
Cite this article
Flores-Contreras, J., Duran-Limon, H.A., Chavoya, A. et al. Performance prediction of parallel applications: a systematic literature review. J Supercomput 77, 4014–4055 (2021). https://doi.org/10.1007/s11227-020-03417-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03417-5