The PEPPHER composition tool: performance-aware composition for GPU-based systems | Computing Skip to main content
Log in

The PEPPHER composition tool: performance-aware composition for GPU-based systems

  • Published:
Computing Aims and scope Submit manuscript

Abstract

The PEPPHER (EU FP7 project) component model defines the notion of component, interface and meta-data for homogeneous and heterogeneous parallel systems. In this paper, we describe and evaluate the PEPPHER composition tool, which explores the application’s components and their implementation variants, generates the necessary low-level code that interacts with the runtime system, and coordinates the native compilation and linking of the various code units to compose the overall application code to optimize performance. We discuss the concept of smart containers and its benefits for reducing dispatch overhead, exploiting implicit parallelism across component invocations and runtime optimization of data transfers. In an experimental evaluation with several applications, we demonstrate that the composition tool provides a high-level programming front-end while effectively utilizing the task-based PEPPHER runtime system (StarPU) underneath for different usage scenarios on GPU-based systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. For demonstration purpose, we have used CUBLAS [4] and CUSP [5] components for CUDA implementations as shown in Sect. 5.

  2. As the PEPPHER runtime system is C based and the C language does not permit to call functions with varying types depending on the actual task being run.

  3. The read and write accesses to container data are distinguished by implementing proxy classes for element data in C++ [11].

References

  1. Benkner S, Pllana S, Träff JL, Tsigas P, Dolinsky U, Augonnet C, Bachmayer B, Kessler C, Moloney D, Osipov V (2011) PEPPHER: efficient and productive usage of hybrid computing systems. IEEE Micro 31(5):28–41

    Article  Google Scholar 

  2. Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exper 23(2):187–198

    Article  Google Scholar 

  3. Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IEEE international symposium on workload characterization (IISWC), pp 44–54

  4. NVIDIA Corporation (2012) CUBLAS library: NVIDIA CUDA basic linear algebra subroutines. http://developer.nvidia.com/cublas/

  5. Bell N, Garland M (2012) CUSP library v0.2: generic parallel algorithms for sparse matrix and graph computations. http://code.google.com/p/cusp-library/

  6. Asanovic K et al (2009) A view of the parallel computing landscape. Commun ACM 52(10):56–67

    Article  Google Scholar 

  7. Kessler CW, Löwe W (2012) Optimized composition of performance-aware parallel components. Concurr Comput Pract Exper 24(5):481–498

    Article  Google Scholar 

  8. Li L, Dastgeer U, Kessler C (2013) Adaptive off-line tuning for optimized composition of components for heterogeneous many-core systems. In: Seventh international workshop on automatic performance tuning (iWAPT-2012), Proc. VECPAR-2012 conference, pp 329–345

  9. Kicherer M, Buchty R, Karl W (2011) Cost-aware function migration in heterogeneous systems. In: Proceedings conference on High Perf. and Emb. Arch. and Comp. (HiPEAC), pp 137–145

  10. Kicherer M, Nowak F, Buchty R, Karl W (2012) Seamlessly portable applications: Managing the diversity of modern heterogeneous systems. ACM Trans Archit Code Optim 8(4):42(1–42:20)

    Google Scholar 

  11. Alexandrescu A (2001) Modern C++ design: generic programming and design patterns applied. Addison-Wesley, Reading

  12. Park R (1992) Software size measurement: a framework for counting source statements. Software Engineering Institute, Carnegie Mellon University, Pittsburgh, Tech. rep

  13. Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw 38(1):1(1–1:25)

    Google Scholar 

  14. Ng R, Levoy M, Brédif M, Duval G, Horowitz M, Hanrahan P (2005) Light field photography with a hand-held plenoptic camera. Stanford University, Stanford, Tech. rep

  15. Augonnet C (2011) Scheduling tasks over multicore machines enhanced with accelerators: a runtime system’s perspective. PhD thesis, Université Bordeaux 1

  16. Ansel J, Chan C, Wong YL, Olszewski M, Zhao Q, Edelman A, Amarasinghe S (2009) PetaBricks: a language and compiler for algorithmic choice. Proc Conf on Prog Lang Design and Impl (PLDI)

  17. Wang PH, Collins JD, Chinya GN, Jiang H, Tian X, Girkar M, Yang NY, Lueh GY, Wang H (2007) EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system. In: Proceedings of conference on programming language design and implementation (PLDI), pp 156–166

  18. Linderman MD, Collins JD, Wang H, Meng THY (2008) Merge: a programming model for heterogeneous multi-core systems. In: Proceedings of international conference on architecture support for programming language and Operating Systems, (ASPLOS 2008), pp 287–296

  19. Huang SS, Hormati A, Bacon DF, Rabbah R (2008) Liquid metal: object-oriented programming across the hardware/software boundary. In: Proceedings of 22nd European conference on object-oriented progamming (ECOOP), pp 76–103

  20. Wernsing JR, Stitt G (2010) Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing. In: Proceedings of conference on languages, compilers, and tools for embedded systems (LCTES), pp 115–124

  21. Chafi H, Sujeeth AK, Brown KJ, Lee H, Atreya AR, Olukotun K (2011) A domain-specific approach to heterogeneous parallelism. In: 16th symposium on principles and practice of parallel programming (PPoPP), pp 35–46

Download references

Acknowledgments

This work was funded by EU FP7, project PEPPHER, grant #248481 (http://www.pep-pher.eu) and by SeRC. We would like to thank University of Vienna for providing access to their machine.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Usman Dastgeer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dastgeer, U., Li, L. & Kessler, C. The PEPPHER composition tool: performance-aware composition for GPU-based systems. Computing 96, 1195–1211 (2014). https://doi.org/10.1007/s00607-013-0371-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-013-0371-8

Keywords

Mathematics Subject Classification

Navigation