Abstract
With the rise of parallel applications complexity, the needs in term of computational power are continually growing. Recent trends in High-Performance Computing (HPC) have shown that improvements in single-core performance will not be sufficient to face the challenges of an exascale machine: we expect an enormous growth of the number of cores as well as a multiplication of the data volume exchanged across compute nodes. To scale applications up to Exascale, the communication layer has to minimize the time while waiting for network messages. This paper presents a message progression based on Collaborative Polling which allows an efficient auto-adaptive overlapping of communication phases by performing computing. This approach is new as it increases the application overlap potential without introducing overheads of a threaded message progression. We designed our approch for Infiniband into a thread-based MPI runtime called MPC. We evaluate the gain from Collaborative Polling on the NAS Parallel Benchmarks and three scientific applications, where we show significant improvements in communication times up to a factor of 2.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Iii JBW, Bova SW (1999) Where’s the overlap? An analysis of popular MPI implementations. Technical report (August 12 1999)
Brightwell R, Riesen R, Underwood KD (2005) Analyzing the impact of overlap, offload, and independent progress for message passing interface applications. IJHPCA
Pérache M, Carribault P, Jourdren H (2009) MPC-MPI: an MPI implementation reducing the overall memory consumption. In: PVM/MPI
Bell C, Bonachea D, Nishtala R, Yelick KA (2006) Optimizing bandwidth limited problems using one-sided communication and overlap. In: IPDPS
Subotic V, Sancho JC, Labarta J, Valero M (2011) The impact of application’s micro-imbalance on the communication-computation overlap. In: Parallel, distributed and network-based processing (PDP)
Thakur R, Gropp W (2007) Test suite for evaluating performance of MPI implementations that support \(\text{ MPI }\_\text{ THREAD }\_\text{ MULTIPLE }\). In: PVM/MPI. pp 46–55
Hager G, Jost G, Rabenseifner R (2009) Communication characteristics and hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In: Proceedings of Cray User Group
Graham R, Poole S, Shamis P, Bloch G, Bloch N, Chapman H, Kagan M, Shahar A, Rabinovitz I, Shainer G (2010) Connectx-2 infiniband management queues: first investigation of the new support for network offloaded collective operations. In: International conference on cluster, cloud and grid computing (CCGRID)
Kamal H, Wagner A (2012) Added concurrency to improve MPI performance on multicore. In: ICPP, IEEE Computer Society, pp 229–238
Almási G, Bellofatto R, Brunheroto J, Caşcaval C, Castaños JG, Crumley P, Erway CC, Lieber D, Martorell X, Moreira JE, Sahoo R, Sanomiya A, Ceze L, Strauss K (2003) An overview of the bluegene/L system software organization. Parallel Process Lett
Amerson G, Apon a (2004) Implementation and design analysis of a network messaging module using virtual interface architecture. In: International conference on cluster computing
Sur S, Jin Hw, Chai L, Panda DK (2006) RDMA read based Rendezvous protocol for MPI over infiniBand: design alternatives and benefits. Alternatives
Kumar R, Mamidala AR, Koop MJ, Santhanaraman G, Panda DK (2008) Lock-free asynchronous rendezvous design for MPI point-to-point communication. In: PVM/MPI
Hoefler T, Lumsdaine A (2008) Message progression in parallel computing to thread or not to thread?. In: International conference on cluster computing
Didelot S, Carribault P, Pérache M, Jalby W (2012) Improving MPI communication overlap with collaborative polling. In: EuroMPI
Trahay F, Denis A (2009) A scalable and generic task scheduling system for communication libraries. In: International conference on cluster computing
Huang C, Lawlor O, Kalé LV (2004) Adaptive MPI. In: LCPC
Rico-Gallego JA, Martín JCD (2011) Performance evaluation of thread-based MPI in shared memory. In: EuroMPI
Demaine E (1997) A threads-only MPI implementation for the development of parallel programming. In: Proceedings of the 11th international symposium on high performance computing systems
Tang H, Yang T (2001) Optimizing threaded MPI execution on SMP clusters. In: International Conference on Supercomputing (ICS)
Carribault P, Pérache M, Jourdren H (2011) Thread-local storage extension to support thread-based MPI/openMP applications. In: Chapman BM, Gropp WD, Kumaran K, Müller MS (eds) IWOMP. Lecturen notes in computer science. Springer, Berlin, pp 80–93
InfiniBand Trade Association: InfiniBand architecture specification
Brightwell R, Pedretti K (2011) An intra-node implementation of openshmem using virtual address space mapping. In: Fifth partitioned global address space conference
Wolff M, Jaouen S, Jourdren H, Sonnendrcker E (2012) High-order dimensionally split lagrange-remap schemes for ideal magnetohydrodynamics. Discrete and Continuous Dynamical Systems - Series S
Bailey D, Harris T, Saphir W, van der Wijngaart R, Woo A, Yarrow M (1995) The NAS Parallel Benchmarks 2.0
Springel V (2005) The cosmological simulation code gadget-2. Monthly Notices of the Royal Astronomical Society 364
Tezuka H, O’Carroll F, Hori A, Ishikawa Y (1998) Pin-down cache: A virtual memory management technique for zero-copy communication. In: IPPS/SPDP, pp 308–314
Acknowledgments
This paper is a result of work performed in Exascale Computing Research Lab with support provided by CEA, GENCI, INTEL, and UVSQ. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the CEA, GENCI, INTEL or UVSQ. We acknowledge that the results in this paper have been achieved using the PRACE Research Infrastructure resource Curie based in France at Bruyères-le-Châtel.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Didelot, S., Carribault, P., Pérache, M. et al. Improving MPI communication overlap with collaborative polling. Computing 96, 263–278 (2014). https://doi.org/10.1007/s00607-013-0327-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-013-0327-z