Abstract
High performance systems have complex, diverse and rapidly evolving architectures. The span of applications, workloads, and resource use patterns is rapidly diversifying. Adapting applications for efficient execution on this spectrum of execution environments is effort intensive. There are many performance optimization tools which implement some or several aspects of the full performance optimization task but almost none are comprehensive across architectures, environments, applications, and workloads. This paper presents, illustrates, and applies a modular infrastructure which enables composition of multiple open-source tools and analyses into a set of workflows implementing comprehensive end-to-end optimization of a diverse spectrum of HPC applications on multiple architectures and for multiple resource types and parallel environments. It gives results from an implementation on the Stampede HPC system at the Texas Advanced Computing Center where a user can submit an application for optimization using only a single command line and get back an at least, partially optimized program without manual program modification for two different chips. Currently, only a subset of the possible optimizations is completely automated but this subset is rapidly growing. Case studies of applications of the workflow are presented. The implementations currently available for download as the PerfExpert tool version 4.0 supports both Sandy Bridge and Intel Phi chips.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alonso, P., Badia, R.M., Labarta, J., Barreda, M., Dolz, M.F., Mayo, R., Quintana-Orti, E.S., Reyes, R.: Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications. In: Proceedings of the International Conference on Parallel Processing, pp. 420–429 (2012)
Banerjee, U., Eigenmann, R., Nicolau, A., Padua, D.A.: Automatic Program Parallelization. Proceedings of the IEEE 81(2), 211–243 (1993)
Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A Portable Programming Interface for Performance Evaluation on Modern Processors. International Journal of High Performance Computing Applications 14(3), 189–204 (2000)
Burtscher, M., Kim, B.-D., Diamond, J., McCalpin, J.D., Koesterke, L., Browne, J.: PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications. In: Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (2010)
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.-H., Skadron, K.: Rodinia: A Benchmark Suite for Heterogeneous Computing. In: Proceedings of the IEEE International Symposium on Workload Characterization, pp. 44–54 (2009)
Chung, I.H., Cong, G., Klepacki, D., Sbaraglia, S., Seelam, S., Wen, H.F.: A Framework for Automated Performance Bottleneck Detection. In: Proceedings of the IEEE International Symposium on Parallel and Distributed processing (2008)
Eigenmann, R.: Toward a Methodology of Optimizing Programs for High-Performance Computers. In: Proceedings of the International Conference on Supercomputing, pp. 27–36 (1993)
Huck, K.A., Malony, A.D., Shende, S., Morris, A.: Knowledge Support and Automation for Performance Analysis with PerfExplorer 2.0. Large-Scale Programming Tools and Environments. Special Issue of Scientific Programming 16(2-3), 123–134 (2008)
Keryell, R., Ancourt, C., Coelho, F., Creusillet, B., Irigoin, F.: PIPS: a Workbench for Building Interprocedural Parallelizers, Compilers and Optimizers. Technical report, École Nationale Supérieure des Mines de Paris (1996)
Kim, S.W., Park, I., Eigenmann, R.: A Performance Advisor Tool for Shared-Memory Parallel Programming. In: Midkiff, S.P., Moreira, J.E., Gupta, M., Chatterjee, S., Ferrante, J., Prins, J.F., Pugh, B., Tseng, C.-W. (eds.) LCPC 2000. LNCS, vol. 2017, pp. 274–288. Springer, Heidelberg (2001)
Klint, P., van der Storm, T., Vinju, J.: RASCAL: A Domain Specific Language for Source Code Analysis and Manipulation. In: Proceedings of the IEEE International Working Conference on Source Code Analysis and Manipulation, pp. 168–177 (2009)
Lattner, C., Adve, V.: LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In: Proceedings of the International Symposium on Code Generation and Optimization, pp. 75–86 (2004)
Llc, B.: Parser Generators. Books LLC. Wiki Series (2010)
Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, vol. 40(6), pp. 190–200 (2005)
Mey, D.A., Biersdorf, S., Bischof, C., Diethelm, K., Eschweiler, D., Gerndt, M., Knüpfer, A., Lorenz, D., Malony, A., Nagel, W.E., Oleynik, Y., Rössel, C., Saviankou, P., Schmidl, D., Shende, S., Wagner, M., Wesarg, B., Wolf, F.: Score-P: A Unified Performance Measurement System for Petascale Applications. In: Proceedings of the International Conference on Competence in High Performance Computing, pp. 85–97 (2011)
Miceli, R., et al.: AutoTune: A Plugin-Driven Approach to the Automatic Tuning of Parallel Applications. In: Manninen, P., Öster, P. (eds.) PARA 2012. LNCS, vol. 7782, pp. 328–342. Springer, Heidelberg (2013)
Miller, B.P., Callaghan, M.D., Cargille, J.M., Hollingsworth, J.K., Irvin, R.B., Karavanic, K.L., Kunchithapadam, K., Newhall, T.: The Paradyn Parallel Performance Measurement Tool. IEEE Computer 28(11), 37–46 (1995)
Nethercote, N., Seward, J.: Valgrind: A Program Supervision Framework. Electronic Notes in Theoretical Computer Science 89(2), 44–66 (2003)
Online, http://ft.ornl.gov/doku/cbtfw/
Pan, Z., Armstrong, B., Bae, H., Eigenmann, R.: On the Interaction of Tiling and Automatic Parallelization. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005/IWOMP 2006. LNCS, vol. 4315, pp. 24–35. Springer, Heidelberg (2008)
Park, I., Kapadia, N.H., Figueiredo, R.J., Eigenmann, R., Fortes, J.A.B.: Towards an Integrated, Web-executable Parallel Programming Tool Environment. In: Proceedings of the Supercomputing Conference (2000)
Rane, A., Browne, J.: Enhancing Performance Optimization of Multicore Chips and Multichip Nodes with Data Structure Metrics. In: Proceedings of the Int. Conference on Parallel Architectures and Compilation Techniques, pp. 147–156 (2012)
Reinders, J.: VTune Performance Analyzer Essentials, 1st edn. Intel Press (2005)
Schordan, M., Quinlan, D.: A Source-To-Source Architecture for User-Defined Optimizations. In: Böszörményi, L., Schojer, P. (eds.) JMLC 2003. LNCS, vol. 2789, pp. 214–223. Springer, Heidelberg (2003)
Schulz, M., Galarowicz, J., Maghrak, D., Hachfeld, W.: Open∣SpeedShop: An open source infrastructure for parallel performance analysis. Scientific Programming 16(2-3), 105–121 (2008)
Shende, S., Malony, A.D.: The Tau Parallel Performance System. International Journal of High Performance Computing Applications 20(2), 287–311 (2006)
Sopeju, O.A., Burtscher, M., Rane, A., Browne, J.: AutoSCOPE: Automatic Suggestions for Code Optimizations using PerfExpert. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, pp. 19–25 (2011)
Tallent, N., Mellor-Crummey, J., Adhianto, L., Fagan, M., Krentel, M.: HPCToolkit: performance tools for scientific computing. Journal of Physics: Conference Series 125(1) (2008)
Tiwari, A., Chen, C., Chame, J., Hall, M., Hollingsworth, J.K.: A Scalable Auto-tuning Framework for Compiler Optimization. In: Proceedings of the IEEE Symposium on Parallel and Distributed Processing (2009)
Wen, H., Sbaraglia, S., Seelam, S., Chung, I., Cong, G., Klepacki, D.: A Productivity Centered Tools Framework for Application Performance Tuning. In: Proceedings of the International Conference on the Quantitative Evaluation of Systems, pp. 273–274 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Fialho, L., Browne, J. (2014). Framework and Modular Infrastructure for Automation of Architectural Adaptation and Performance Optimization for HPC Systems. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2014. Lecture Notes in Computer Science, vol 8488. Springer, Cham. https://doi.org/10.1007/978-3-319-07518-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-07518-1_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07517-4
Online ISBN: 978-3-319-07518-1
eBook Packages: Computer ScienceComputer Science (R0)