Overlapping Computations with Communications and I/O Explicitly Using OpenMP Based Heterogeneous Threading Models | SpringerLink
Skip to main content

Overlapping Computations with Communications and I/O Explicitly Using OpenMP Based Heterogeneous Threading Models

  • Conference paper
OpenMP in a Heterogeneous World (IWOMP 2012)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7312))

Included in the following conference series:

Abstract

Holistic tuning and optimization of hybrid MPI and OpenMP applications is becoming focus for parallel code developers as the number of cores and hardware threads in processing nodes of high-end systems continue to increase. For example, there is support for 32 hardware threads on a Cray XE6 node with Interlagos processors while the IBM Blue Gene/Q system could support up to 64 threads per node. Note that, by default, OpenMP threads and MPI tasks are pinned to processor cores on these high-end systems and throughout the paper we assume fix bindings of threads to physical cores for the discussion. A number of OpenMP runtimes also support user specified bindings of threads to physical cores. Parallel and node efficiencies on these high-end systems for hybrid MPI and OpenMP applications largely depend on balancing and overlapping computation and communication workloads. This issue is further intensified when the nodes have a non-uniform access memory (NUMA) model and I/O accelerator devices. In these environments, where access to I/O devices such as GPU for code acceleration and network interface for MPI communication and parallel file I/O are managed and scheduled by a host CPU, application developers could introduce innovative solutions to overlap CPUs and I/O operations to improve node and parallel efficiencies. For example, in a production level application called BigDFT, the developers have introduced a master-slave model to explicitly overlap blocking, collective communication operations and local multi-threaded computation. Similarly some applications parallelized with MPI, OpenMP and GPU acceleration could assign a management thread for the GPU data and control orchestration, an MPI control thread for communication management while the CPU threads perform overlapping calculations, and potentially a background thread can be set aside for file I/O based fault-tolerance. Considering these emerging applications design needs, we would like to motivate the OpenMP standards committee, through examples and empirical results, to introduce thread and task heterogeneity in the language specification. This will allow code developers, especially those programming for large-scale distributed-memory HPC systems and accelerator devices, to design and develop portable solutions with overlapping control and data flow for their applications without resorting to custom solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. BigDFT code, http://inac.cea.fr/L_Sim/BigDFT/

  2. Cray XE6 system, http://www.cray.com/Products/XE/CrayXE6System.aspx

  3. Cray XK6 system, http://www.cray.com/Products/XK6/XK6.aspx

  4. Ayguade, E., Badia, R.M., Cabrera, D., Duran, A., Gonzalez, M., Igual, F., Jimenez, D., Labarta, J., Martorell, X., Mayo, R., Perez, J.M., Quintana-Ortí, E.S.: A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 154–167. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Beyer, J.C., Stotzer, E.J., Hart, A., de Supinski, B.R.: OpenMP for Accelerators. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 108–121. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  6. Fatica, M.: Accelerating Linpack with CUDA on heterogeneous clusters. In: GPGPU-2 Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units. ACM, New York (2009)

    Google Scholar 

  7. Genovese, L., Neelov, A., Goedecker, S., Deutsch, T., Ghasemi, A., Zilberberg, O., Bergman, Rayson, M., Schneider, R.: Daubechies wavelets as a basis set for density functional pseudopotential calculations. J. Chem. Phys. 129, 14109 (2008)

    Article  Google Scholar 

  8. Jones, W.M., Daly, J.T., DeBardeleben, N.A.: Application Resilience: Making Progress in Spite of Failure. In: Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), pp. 789–794 (2008)

    Google Scholar 

  9. Park, B.H., Naughton, T.J., Agarwal, P.K., Bernholdt, D.E., Geist, A., Tippens, J.L.: Realization of User Level Fault Tolerant Policy Management through a Holistic Approach for Fault Correlation. In: IEEE Symp. on Policies for Distributed Systems and Networks (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alam, S.R., Fourestey, G., Videau, B., Genovese, L., Goedecker, S., Dugan, N. (2012). Overlapping Computations with Communications and I/O Explicitly Using OpenMP Based Heterogeneous Threading Models. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds) OpenMP in a Heterogeneous World. IWOMP 2012. Lecture Notes in Computer Science, vol 7312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30961-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30961-8_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30960-1

  • Online ISBN: 978-3-642-30961-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics