Abstract
GPUs are slowly becoming ubiquitous devices in High Performance Computing, as their capabilities to enhance the performance per watt of compute intensive algorithms as compared to multicore CPUs have been identified. The primary shortcoming of a GPU is usability, since vendor specific APIs are quite different from existing programming languages, and it requires a substantial knowledge of the device and programming interface to optimize applications. Hence, lately a growing number of higher level programming models are targeting GPUs to alleviate this problem. The ultimate goal for a high-level model is to expose an easy-to-use interface for the user to offload compute intensive portions of code (kernels) to the GPU, and tune the code according to the target accelerator to maximize overall performance with a reduced development effort. In this paper, we share our experiences of three of the notable high-level directive based GPU programming models—PGI, CAPS and OpenACC (from CAPS and PGI) on an Nvidia M2090 GPU. We analyze their performance and programmability against Isotropic (ISO)/Tilted Transversely Isotropic (TTI) finite difference kernels, which are primary components in the Reverse Time Migration (RTM) application used by oil and gas exploration for seismic imaging of the sub-surface. When ported to a single GPU using the mentioned directives, we observe an average 1.5–1.8x improvement in performance for both ISO and TTI kernels, when compared with optimized multi-threaded CPU implementations using OpenMP.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
OpenMP ARB (2010) The OpenMP API specification for parallel programming. http://openmp.org/wp/
Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23(2):187–198
Ayguadé E, Badia RM, Igual FD, Labarta J, Mayo R, Quintana-Ortí ES (2009) An extension of the starss programming model for platforms with multiple gpus. In: Euro-Par 2009 parallel processing, Springer, New York, pp 851–862
Benkner S, Pllana S, Traff JL, Tsigas P, Dolinsky U, Augonnet C, Bachmayer B, Kessler C, Moloney D, Osipov V (2011) Peppher: efficient and productive usage of hybrid computing systems. Micro IEEE 31(5):28–41
Bihan S, Moulard GE, Dolbeau R, Calandra H, Abdelkhalek R (2009) Directive-based heterogeneous programming-a gpu-accelerated rtm use case. In: Proceedings of the 7th international conference on computing, communications and control technologies
Datta K, Murphy M, Volkov V, Williams S, Carter J, Oliker L, Patterson D, Shalf J, Yelick K (2008) Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE conference on supercomputing, IEEE Press, New York, p 4
Dolbeau R, Bihan S, Bodin F (2007) Hmpp: a hybrid multi-core parallel programming environment. In: Workshop on general purpose processing on graphics processing units (GPGPU 2007)
The Portland Group (2010) Pgi accelerator programming model. http://www.pgroup.com/resources/accel.htm
Lee S, Eigenmann R (2010) Openmpc: extended openmp programming and tuning for gpus. In: Proceedings of the 2010 ACM/IEEE international conference for high performance computing, networking, storage and analysis, IEEE Computer Society, pp 1–11
Nvidia (2011) Nvidia cuda visual profiler. http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/Compute_Visual_Profiler_User_Guide.pdf
CUDA Nvidia (2007) Compute unified device architecture programming guide
PGI Nvidia, CAPS and Cray (2011) Openacc application programming interface: directives for accelerators. http://www.openacc.org
Stone JE, Gohara D, Shi G (2010) Opencl: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66
Whitehead N, Fit-Florea A (2011) Precision & performance: floating point and ieee 754 compliance for nvidia gpus. rn (A + B) 21:1–1874919 424
Acknowledgments
We wish to thank Georges-Emmanuel Moulard of CAPS enterprise, Matthew Colgrove of PGI and, Philippe Thierry of Intel Corp., who helped us immensely by providing answers to our questions, and suggesting improvements. This work would not have been possible without their guidance. Finally, we would like to thank TOTAL for granting permission to publish this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ghosh, S., Liao, T., Calandra, H. et al. Performance of CPU/GPU compiler directives on ISO/TTI kernels. Computing 96, 1149–1162 (2014). https://doi.org/10.1007/s00607-013-0367-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-013-0367-4