Improving MPI communication overlap with collaborative polling

Didelot, Sylvain; Carribault, Patrick; Pérache, Marc; Jalby, William

doi:10.1007/s00607-013-0327-z

Improving MPI communication overlap with collaborative polling

Published: 09 May 2013

Volume 96, pages 263–278, (2014)
Cite this article

Computing Aims and scope Submit manuscript

Sylvain Didelot^1,2,
Patrick Carribault^1,2,3,
Marc Pérache^1,2,3 &
…
William Jalby^1,2

461 Accesses
3 Altmetric
Explore all metrics

Abstract

With the rise of parallel applications complexity, the needs in term of computational power are continually growing. Recent trends in High-Performance Computing (HPC) have shown that improvements in single-core performance will not be sufficient to face the challenges of an exascale machine: we expect an enormous growth of the number of cores as well as a multiplication of the data volume exchanged across compute nodes. To scale applications up to Exascale, the communication layer has to minimize the time while waiting for network messages. This paper presents a message progression based on Collaborative Polling which allows an efficient auto-adaptive overlapping of communication phases by performing computing. This approach is new as it increases the application overlap potential without introducing overheads of a threaded message progression. We designed our approch for Infiniband into a thread-based MPI runtime called MPC. We evaluate the gain from Collaborative Polling on the NAS Parallel Benchmarks and three scientific applications, where we show significant improvements in communication times up to a factor of 2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Exploring Hierarchical MPI Reduction Collective Algorithms Targeted to Multicore Node Clusters

Communication-Aware Hardware-Assisted MPI Overlap Engine

Exploiting copy engines for intra-node MPI collective communication

Article Open access 11 May 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Iii JBW, Bova SW (1999) Where’s the overlap? An analysis of popular MPI implementations. Technical report (August 12 1999)
Brightwell R, Riesen R, Underwood KD (2005) Analyzing the impact of overlap, offload, and independent progress for message passing interface applications. IJHPCA
Pérache M, Carribault P, Jourdren H (2009) MPC-MPI: an MPI implementation reducing the overall memory consumption. In: PVM/MPI
Bell C, Bonachea D, Nishtala R, Yelick KA (2006) Optimizing bandwidth limited problems using one-sided communication and overlap. In: IPDPS
Subotic V, Sancho JC, Labarta J, Valero M (2011) The impact of application’s micro-imbalance on the communication-computation overlap. In: Parallel, distributed and network-based processing (PDP)
Thakur R, Gropp W (2007) Test suite for evaluating performance of MPI implementations that support \(\text{ MPI }\_\text{ THREAD }\_\text{ MULTIPLE }\). In: PVM/MPI. pp 46–55
Hager G, Jost G, Rabenseifner R (2009) Communication characteristics and hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In: Proceedings of Cray User Group
Graham R, Poole S, Shamis P, Bloch G, Bloch N, Chapman H, Kagan M, Shahar A, Rabinovitz I, Shainer G (2010) Connectx-2 infiniband management queues: first investigation of the new support for network offloaded collective operations. In: International conference on cluster, cloud and grid computing (CCGRID)
Kamal H, Wagner A (2012) Added concurrency to improve MPI performance on multicore. In: ICPP, IEEE Computer Society, pp 229–238
Almási G, Bellofatto R, Brunheroto J, Caşcaval C, Castaños JG, Crumley P, Erway CC, Lieber D, Martorell X, Moreira JE, Sahoo R, Sanomiya A, Ceze L, Strauss K (2003) An overview of the bluegene/L system software organization. Parallel Process Lett
Amerson G, Apon a (2004) Implementation and design analysis of a network messaging module using virtual interface architecture. In: International conference on cluster computing
Sur S, Jin Hw, Chai L, Panda DK (2006) RDMA read based Rendezvous protocol for MPI over infiniBand: design alternatives and benefits. Alternatives
Kumar R, Mamidala AR, Koop MJ, Santhanaraman G, Panda DK (2008) Lock-free asynchronous rendezvous design for MPI point-to-point communication. In: PVM/MPI
Hoefler T, Lumsdaine A (2008) Message progression in parallel computing to thread or not to thread?. In: International conference on cluster computing
Didelot S, Carribault P, Pérache M, Jalby W (2012) Improving MPI communication overlap with collaborative polling. In: EuroMPI
Trahay F, Denis A (2009) A scalable and generic task scheduling system for communication libraries. In: International conference on cluster computing
Huang C, Lawlor O, Kalé LV (2004) Adaptive MPI. In: LCPC
Rico-Gallego JA, Martín JCD (2011) Performance evaluation of thread-based MPI in shared memory. In: EuroMPI
Demaine E (1997) A threads-only MPI implementation for the development of parallel programming. In: Proceedings of the 11th international symposium on high performance computing systems
Tang H, Yang T (2001) Optimizing threaded MPI execution on SMP clusters. In: International Conference on Supercomputing (ICS)
Carribault P, Pérache M, Jourdren H (2011) Thread-local storage extension to support thread-based MPI/openMP applications. In: Chapman BM, Gropp WD, Kumaran K, Müller MS (eds) IWOMP. Lecturen notes in computer science. Springer, Berlin, pp 80–93
Google Scholar
InfiniBand Trade Association: InfiniBand architecture specification
Brightwell R, Pedretti K (2011) An intra-node implementation of openshmem using virtual address space mapping. In: Fifth partitioned global address space conference
Wolff M, Jaouen S, Jourdren H, Sonnendrcker E (2012) High-order dimensionally split lagrange-remap schemes for ideal magnetohydrodynamics. Discrete and Continuous Dynamical Systems - Series S
Bailey D, Harris T, Saphir W, van der Wijngaart R, Woo A, Yarrow M (1995) The NAS Parallel Benchmarks 2.0
Springel V (2005) The cosmological simulation code gadget-2. Monthly Notices of the Royal Astronomical Society 364
Tezuka H, O’Carroll F, Hori A, Ishikawa Y (1998) Pin-down cache: A virtual memory management technique for zero-copy communication. In: IPPS/SPDP, pp 308–314

Download references

Acknowledgments

This paper is a result of work performed in Exascale Computing Research Lab with support provided by CEA, GENCI, INTEL, and UVSQ. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the CEA, GENCI, INTEL or UVSQ. We acknowledge that the results in this paper have been achieved using the PRACE Research Infrastructure resource Curie based in France at Bruyères-le-Châtel.

Author information

Authors and Affiliations

Exascale Computing Research Center, Versailles, France
Sylvain Didelot, Patrick Carribault, Marc Pérache & William Jalby
Université de Versailles Saint-Quentin-en-Yvelines (UVSQ), Versailles, France
Sylvain Didelot, Patrick Carribault, Marc Pérache & William Jalby
CEA, DAM, DIF, F-91297 , Arpajon, France
Patrick Carribault & Marc Pérache

Authors

Sylvain Didelot
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Carribault
View author publications
You can also search for this author in PubMed Google Scholar
Marc Pérache
View author publications
You can also search for this author in PubMed Google Scholar
William Jalby
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sylvain Didelot.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Didelot, S., Carribault, P., Pérache, M. et al. Improving MPI communication overlap with collaborative polling. Computing 96, 263–278 (2014). https://doi.org/10.1007/s00607-013-0327-z

Download citation

Received: 14 December 2012
Accepted: 26 April 2013
Published: 09 May 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s00607-013-0327-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Improving MPI communication overlap with collaborative polling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploring Hierarchical MPI Reduction Collective Algorithms Targeted to Multicore Node Clusters

Communication-Aware Hardware-Assisted MPI Overlap Engine

Exploiting copy engines for intra-node MPI collective communication

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Improving MPI communication overlap with collaborative polling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploring Hierarchical MPI Reduction Collective Algorithms Targeted to Multicore Node Clusters

Communication-Aware Hardware-Assisted MPI Overlap Engine

Exploiting copy engines for intra-node MPI collective communication

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation