Abstract
The PEPPHER (EU FP7 project) component model defines the notion of component, interface and meta-data for homogeneous and heterogeneous parallel systems. In this paper, we describe and evaluate the PEPPHER composition tool, which explores the application’s components and their implementation variants, generates the necessary low-level code that interacts with the runtime system, and coordinates the native compilation and linking of the various code units to compose the overall application code to optimize performance. We discuss the concept of smart containers and its benefits for reducing dispatch overhead, exploiting implicit parallelism across component invocations and runtime optimization of data transfers. In an experimental evaluation with several applications, we demonstrate that the composition tool provides a high-level programming front-end while effectively utilizing the task-based PEPPHER runtime system (StarPU) underneath for different usage scenarios on GPU-based systems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
As the PEPPHER runtime system is C based and the C language does not permit to call functions with varying types depending on the actual task being run.
The read and write accesses to container data are distinguished by implementing proxy classes for element data in C++ [11].
References
Benkner S, Pllana S, Träff JL, Tsigas P, Dolinsky U, Augonnet C, Bachmayer B, Kessler C, Moloney D, Osipov V (2011) PEPPHER: efficient and productive usage of hybrid computing systems. IEEE Micro 31(5):28–41
Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exper 23(2):187–198
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IEEE international symposium on workload characterization (IISWC), pp 44–54
NVIDIA Corporation (2012) CUBLAS library: NVIDIA CUDA basic linear algebra subroutines. http://developer.nvidia.com/cublas/
Bell N, Garland M (2012) CUSP library v0.2: generic parallel algorithms for sparse matrix and graph computations. http://code.google.com/p/cusp-library/
Asanovic K et al (2009) A view of the parallel computing landscape. Commun ACM 52(10):56–67
Kessler CW, Löwe W (2012) Optimized composition of performance-aware parallel components. Concurr Comput Pract Exper 24(5):481–498
Li L, Dastgeer U, Kessler C (2013) Adaptive off-line tuning for optimized composition of components for heterogeneous many-core systems. In: Seventh international workshop on automatic performance tuning (iWAPT-2012), Proc. VECPAR-2012 conference, pp 329–345
Kicherer M, Buchty R, Karl W (2011) Cost-aware function migration in heterogeneous systems. In: Proceedings conference on High Perf. and Emb. Arch. and Comp. (HiPEAC), pp 137–145
Kicherer M, Nowak F, Buchty R, Karl W (2012) Seamlessly portable applications: Managing the diversity of modern heterogeneous systems. ACM Trans Archit Code Optim 8(4):42(1–42:20)
Alexandrescu A (2001) Modern C++ design: generic programming and design patterns applied. Addison-Wesley, Reading
Park R (1992) Software size measurement: a framework for counting source statements. Software Engineering Institute, Carnegie Mellon University, Pittsburgh, Tech. rep
Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw 38(1):1(1–1:25)
Ng R, Levoy M, Brédif M, Duval G, Horowitz M, Hanrahan P (2005) Light field photography with a hand-held plenoptic camera. Stanford University, Stanford, Tech. rep
Augonnet C (2011) Scheduling tasks over multicore machines enhanced with accelerators: a runtime system’s perspective. PhD thesis, Université Bordeaux 1
Ansel J, Chan C, Wong YL, Olszewski M, Zhao Q, Edelman A, Amarasinghe S (2009) PetaBricks: a language and compiler for algorithmic choice. Proc Conf on Prog Lang Design and Impl (PLDI)
Wang PH, Collins JD, Chinya GN, Jiang H, Tian X, Girkar M, Yang NY, Lueh GY, Wang H (2007) EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system. In: Proceedings of conference on programming language design and implementation (PLDI), pp 156–166
Linderman MD, Collins JD, Wang H, Meng THY (2008) Merge: a programming model for heterogeneous multi-core systems. In: Proceedings of international conference on architecture support for programming language and Operating Systems, (ASPLOS 2008), pp 287–296
Huang SS, Hormati A, Bacon DF, Rabbah R (2008) Liquid metal: object-oriented programming across the hardware/software boundary. In: Proceedings of 22nd European conference on object-oriented progamming (ECOOP), pp 76–103
Wernsing JR, Stitt G (2010) Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing. In: Proceedings of conference on languages, compilers, and tools for embedded systems (LCTES), pp 115–124
Chafi H, Sujeeth AK, Brown KJ, Lee H, Atreya AR, Olukotun K (2011) A domain-specific approach to heterogeneous parallelism. In: 16th symposium on principles and practice of parallel programming (PPoPP), pp 35–46
Acknowledgments
This work was funded by EU FP7, project PEPPHER, grant #248481 (http://www.pep-pher.eu) and by SeRC. We would like to thank University of Vienna for providing access to their machine.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dastgeer, U., Li, L. & Kessler, C. The PEPPHER composition tool: performance-aware composition for GPU-based systems. Computing 96, 1195–1211 (2014). https://doi.org/10.1007/s00607-013-0371-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-013-0371-8