Performance of CPU/GPU compiler directives on ISO/TTI kernels | Computing Skip to main content
Log in

Performance of CPU/GPU compiler directives on ISO/TTI kernels

  • Published:
Computing Aims and scope Submit manuscript

Abstract

GPUs are slowly becoming ubiquitous devices in High Performance Computing, as their capabilities to enhance the performance per watt of compute intensive algorithms as compared to multicore CPUs have been identified. The primary shortcoming of a GPU is usability, since vendor specific APIs are quite different from existing programming languages, and it requires a substantial knowledge of the device and programming interface to optimize applications. Hence, lately a growing number of higher level programming models are targeting GPUs to alleviate this problem. The ultimate goal for a high-level model is to expose an easy-to-use interface for the user to offload compute intensive portions of code (kernels) to the GPU, and tune the code according to the target accelerator to maximize overall performance with a reduced development effort. In this paper, we share our experiences of three of the notable high-level directive based GPU programming models—PGI, CAPS and OpenACC (from CAPS and PGI) on an Nvidia M2090 GPU. We analyze their performance and programmability against Isotropic (ISO)/Tilted Transversely Isotropic (TTI) finite difference kernels, which are primary components in the Reverse Time Migration (RTM) application used by oil and gas exploration for seismic imaging of the sub-surface. When ported to a single GPU using the mentioned directives, we observe an average 1.5–1.8x improvement in performance for both ISO and TTI kernels, when compared with optimized multi-threaded CPU implementations using OpenMP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. OpenMP ARB (2010) The OpenMP API specification for parallel programming. http://openmp.org/wp/

  2. Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23(2):187–198

    Article  Google Scholar 

  3. Ayguadé E, Badia RM, Igual FD, Labarta J, Mayo R, Quintana-Ortí ES (2009) An extension of the starss programming model for platforms with multiple gpus. In: Euro-Par 2009 parallel processing, Springer, New York, pp 851–862

  4. Benkner S, Pllana S, Traff JL, Tsigas P, Dolinsky U, Augonnet C, Bachmayer B, Kessler C, Moloney D, Osipov V (2011) Peppher: efficient and productive usage of hybrid computing systems. Micro IEEE 31(5):28–41

    Article  Google Scholar 

  5. Bihan S, Moulard GE, Dolbeau R, Calandra H, Abdelkhalek R (2009) Directive-based heterogeneous programming-a gpu-accelerated rtm use case. In: Proceedings of the 7th international conference on computing, communications and control technologies

  6. Datta K, Murphy M, Volkov V, Williams S, Carter J, Oliker L, Patterson D, Shalf J, Yelick K (2008) Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE conference on supercomputing, IEEE Press, New York, p 4

  7. Dolbeau R, Bihan S, Bodin F (2007) Hmpp: a hybrid multi-core parallel programming environment. In: Workshop on general purpose processing on graphics processing units (GPGPU 2007)

  8. The Portland Group (2010) Pgi accelerator programming model. http://www.pgroup.com/resources/accel.htm

  9. Lee S, Eigenmann R (2010) Openmpc: extended openmp programming and tuning for gpus. In: Proceedings of the 2010 ACM/IEEE international conference for high performance computing, networking, storage and analysis, IEEE Computer Society, pp 1–11

  10. Nvidia (2011) Nvidia cuda visual profiler. http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/Compute_Visual_Profiler_User_Guide.pdf

  11. CUDA Nvidia (2007) Compute unified device architecture programming guide

  12. PGI Nvidia, CAPS and Cray (2011) Openacc application programming interface: directives for accelerators. http://www.openacc.org

  13. Stone JE, Gohara D, Shi G (2010) Opencl: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66

    Article  Google Scholar 

  14. Whitehead N, Fit-Florea A (2011) Precision & performance: floating point and ieee 754 compliance for nvidia gpus. rn (A + B) 21:1–1874919 424

    Google Scholar 

Download references

Acknowledgments

We wish to thank Georges-Emmanuel Moulard of CAPS enterprise, Matthew Colgrove of PGI and, Philippe Thierry of Intel Corp., who helped us immensely by providing answers to our questions, and suggesting improvements. This work would not have been possible without their guidance. Finally, we would like to thank TOTAL for granting permission to publish this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sayan Ghosh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghosh, S., Liao, T., Calandra, H. et al. Performance of CPU/GPU compiler directives on ISO/TTI kernels. Computing 96, 1149–1162 (2014). https://doi.org/10.1007/s00607-013-0367-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-013-0367-4

Keywords

Mathematics Subject Classification

Navigation