Abstract
Simulators and empirical profiling data are often used to understand how suitable a specific hardware architecture is for an application. However, simulators can be slow, and empirical profiling-based methods can only provide insights about the existing hardware on which the applications are executed. While the insights obtained in this way are valuable, such methods cannot be used to evaluate a large number of system designs efficiently. Analytical performance evaluation models are fast alternatives, particularly well-suited for system design-space exploration. However, to be truly application-specific, they need to be combined with a workload model that captures relevant application characteristics. In this paper we introduce PISA, a framework based on the LLVM infrastructure that is able to generate such a model for sequential and parallel applications by performing hardware-independent characterization. Characteristics such as instruction-level parallelism, memory access patterns and branch behavior are analyzed per thread or process during application execution. To illustrate the potential of the framework, we provide a detailed characterization of a representative benchmark for graph-based analytics, Graph 500. Finally, we analyze how the properties extracted with PISA across Graph 500 and SPEC CPU2006 applications compare to measurements performed on x86 and POWER8 processors.




















Similar content being viewed by others
References
Anghel, A., Rodriguez, G., Prisacari, B., Minkenberg, C., Dittmann, G.: Quantifying communication in graph analytics. In: Kunkel, J.M., Ludwig, T. (eds.) High Performance Computing. Lecture Notes in Computer Science, vol. 9137, pp. 472–487. Springer International Publishing (2015)
Argollo, E., Falcón, A., Faraboschi, P., Monchiero, M., Ortega, D.: Cotson: infrastructure for full system simulation. SIGOPS Oper. Syst. Rev. 43(1), 52–61 (2009)
Beckmann, N., Eastep, J., Gruenwald, C., Kurian, G., Kasture, H., Miller, J.E., Celio, C., Agarwal, A.: Graphite: a distributed parallel simulator for multicores. Technical report, MIT (2009)
Cabezas, V.: A tool for analysis and visualization of application properties. Technical Report RZ3834, IBM (2012)
Carlson, T.E., Heirman, W., Eeckhout, L.: Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pp. 52:1–52:12. ACM, New York, NY, USA (2011)
Carlson, T.E., Heirman, W., Eyerman, S., Hur, I., Eeckhout, L.: An evaluation of high-level mechanistic core models. ACM Transactions on Architecture and Code Optimization (TACO), (2014)
Czechowski, K., Battaglino, C., McClanahan, C., Chandramowlishwaran, A., Vuduc, R.: Balance principles for algorithm-architecture co-design. In: Proceedings of HotPar’11, pp. 9–9. USENIX Association, Berkeley, CA, USA
Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of ASPLOS’12, pp. 37–48
Fog, A.: The microarchitecture of intel, amd and via cpus. An optimization guide for assembly programmers and compiler makers. http://www.agner.org/optimize/microarchitecture.pdf
Graph 500: Graph 500 benchmark. http://www.graph500.org/
Hennessy, J.L., Patterson, D.A.: Computer Architecture, Fourth Edition: A Quantitative Approach. Morgan Kaufmann Publishers Inc, San Francisco (2006)
Hoste, K., Eeckhout, L.: Microarchitecture-independent workload characterization. IEEE Micro 27(3), 63–72 (2007)
Jongerius, R., Mariani, G., Anghel, A., Dittmann, G., Vermij, E., Corporaal, H.: Analytic processor model for fast design-space exploration. In: Proceedings of the 33rd IEEE International Conference on Computer Design (ICCD), ICCD’15 (2015)
Jose, J., Potluri, S., Tomko, K., Panda, D.K.: Designing scalable Graph500 benchmark with hybrid MPI+OpenSHMEM programming models. In: ISC’13. Lecture Notes in Computer Science, vol. 7905, pp. 109–124. Springer
Lam, M.S., Wilson, R.P.: Limits of control flow on parallelism. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, ISCA ’92, pp. 46–57 (1992)
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of CGO’04, pp. 75–86
Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of PLDI’05, pp. 190–200. ACM, New York, NY, USA (2005)
Patel, A., Afram, F., Chen, S., Ghose, K.: Marss: a full system simulator for multicore x86 cpus. In: Proceedings of the 48th Design Automation Conference, DAC ’11, pp 1050–1055. ACM, New York, NY, USA (2011)
Shao, Y.S., Brooks, D.: ISA-independent workload characterization and its implications for specialized architectures. In: Proceedings of ISPASS’13, pp. 245–255
Sharapov, I., Kroeger, R., Delamarter, G., Cheveresan, R., Ramsay, M.: A case study in top-down performance estimation for a large-scale parallel application. In: Proceedings of PPoPP’06, pp. 81–89. ACM
Suzumura, T., Ueno, K., Sato, H., Fujisawa, K., Matsuoka, S.: Performance characteristics of Graph500 on large-scale distributed environment. In: Proceedings of IISWC’11, pp. 149–158
Yokota, T., Ootsu, K., Baba, T.: Potentials of branch predictors: from entropy viewpoints. In: Proceedings of the 21st International Conference on Architecture of Computing Systems, ARCS’08, pp. 273–285. Springer, Berlin, Heidelberg (2008)
Zhong, Y., Shen, X., Ding, C.: Program locality analysis using reuse distance. ACM Trans. Program. Lang. Syst 31(6), 20:1–20:39 (2009)
Acknowledgments
This work is conducted in the context of the joint ASTRON and IBM DOME project and is funded by the Netherlands Organisation for Scientific Research (NWO), the Dutch Ministry of EL&I, and the Province of Drenthe. We would like to thank Evelina Dumitrescu for running part of the OpenMP and MPI PISA characterizations.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Anghel, A., Vasilescu, L.M., Mariani, G. et al. An Instrumentation Approach for Hardware-Agnostic Software Characterization. Int J Parallel Prog 44, 924–948 (2016). https://doi.org/10.1007/s10766-016-0410-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0410-0