Abstract
Parallel programming is a complex, and since the multi-core era has dawned, also a more and more common task that can be alleviated considerably by tools supporting the application development and porting process. The Message Passing Interface (MPI) is widely used to write parallel programs using message passing, but it does not guarantee portability between different MPI implementations. When an application runs without any problems on one platform but crashes or gives wrong results on another platform, developers tend to blame the compiler/architecture/MPI implementation. In many cases the problem is a subtle programming error in the application undetected on the platforms used previously. Finding this bug can be a very strenuous and difficult task. In this paper we present the Marmot tool, an automated correctness checker for MPI applications during runtime. Examples of violations of the MPI standard are the introduction of irreproducibility, deadlocks, incorrect management of resources such as communicators, groups, datatypes etc. or the use of non-portable constructs. To cover different aspects of correctness debugging in a user-friendly environment, also in hybrid applications using both MPI and OpenMP, we also work on coupling Marmot with a parallel debugger (DDT) or a threading tool (Intel® Thread Checker). Some examples of experiences with real-world applications are given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Message Passing Interface Forum. MPI: A Message Passing Interface Standard, June 1995. http://www.mpi-forum.org/.
Message Passing Interface Forum. MPI-2: Extensions to the Message Passing Interface, July 1997. http://www.mpi-forum.org/.
Jeffrey S. Vetter and Bronis R. de Supinski. Dynamic Software Testing of MPI Applications with Umpire. In Proceedings of the 2000 ACM/IEEE Supercomputing Conference (SC 2000), Dallas, Texas, 2000.
William D. Gropp. Runtime Checking Of Datatype Signatures In MPI. In Recent Advances In Parallel Virtual Machine And Message Passing. 7th European PVM/MPI Users’ Group Meeting. LNCS 1908, pages 160-167. Springer 2000.
Chris Falzone, Anthony Chan, Ewing Lusk and William Gropp. Collective Error Detection for MPI Collective Operations. In Recent Advances In Parallel Virtual Machine And Message Passing. 12th European PVM/MPI Users’ Group Meeting. LNCS 3666, pages 138-147. Springer 2005.
J.L. Träff and J. Worringen. Verifying Collective MPI Calls. In Recent Advances In Parallel Virtual Machine And Message Passing. 11th European PVM/MPI Users’ Group Meeting. LNCS 3241, pages 18-27, Springer, 2004.
Dieter Kranzlmüller. Event Graph Analysis For Debugging Massively Parallel Programs. Phd thesis, Joh. Kepler University Linz, Austria, 2000.
Glenn Luecke, Yan Zou, James Coyle, Jim Hoekstra and Marina Kraeva. Deadlock Detection In MPI Programs. In Concurrency and Computation: Practice and Experience. 2002, vol. 14, pages 911-932.
Bettina Krammer, Matthias S. Müller and Michael M. Resch. MPI I/O Analysis and Error Detection with Marmot. In Recent Advances In Parallel Virtual Machine And Message Passing. 11th European PVM/MPI Users’ Group Meeting. LNCS 3241, pages 242-250, Springer, 2004.
Bettina Krammer, Katrin Bidmon, Matthias S. Müller, and Michael M. Resch. Marmot: An MPI analysis and checking tool. In Proceedings of PARCO 2003, pages 493-500, Elsevier, 2004.
Bettina Krammer, Matthias S. Müller and Michael M. Resch. MPI Application Development Using the Analysis Tool Marmot, In Proceedings of ICCS 2004, LNCS 3038, pages 464-471, Springer 2004.
Bettina Krammer, Valentin Himmler, David Lecomber. Coupling DDT and Marmot for Debugging of MPI Applications. In Proc. of ParCo 2007, Jülich/Aachen, Germany, September 4-7, 2007. NIC Series, Vol. 38, pp. 653-660
KOJAK. Kit for Objective Judgement and Knowledge-based Detection of Performance Bottlenecks http://www.fz-juelich.de/jsc/kojak/
Markus Geimer, Felix Wolf, Brian J.N. Wylie, and Bernd Mohr. Scalable Parallel Trace-Based Performance Analysis. In Proceedings of the 13th European Parallel Virtual Machine and Message Passing Interface Conference, LNCS 4192, pages 303-312, Springer 2006.
DDT. The Distributed Debugging Tool. http://www.allinea.com/?page=48
Totalview. http://www.totalviewtech.com/productsTV.htm
mpigdb. http://www-unix.mcs.anl.gov/mpi/MPICH/docs/userguide/node26.htm#Node29
The GNU Project Debugger. http://www.gnu.org/manual/gdb
The Data Display Debugger. http://www.gnu.org/software/ddd/
The Cross-Platform Makefile Generator http://www.cmake.org
Brett Carson and Ian A. Mason. ClusterGrind: Valgrinding LAM/MPI Applications. In Recent Advances In Parallel Virtual Machine And Message Passing. 12th European PVM/MPI Users’ Group Meeting. LNCS 3666, pages 325-332. Springer 2005.
Rainer Keller, Shiqing Fan and Michael Resch. Memory debugging of MPI-parallel Applications in Open MPI. In Proceedings of ParCo’07, G.R. Joubert et al. (eds), Juelich, Germany, September, 2007.
Julian Seward and Nicholas Nethercote. Using valgrind to detect undefined value errors with bit-precision. In ATEC ’05: Proceedings of the annual conference on USENIX Annual Technical Conference, Berkeley, CA, USA, USENIX Association (2005), 2–2.
Jayant DeSouza, Bob Kuhn and Bronis R. de Supinski. Automated, scalable debugging of MPI programs with Intel Message Checker. SE-HPCS ’05, St. Louis, Missouri, USA. http://csdl.ics.hawaii.edu/se-hpcs/papers/11.pdf
Patrick Ohly and Werner Krotz-Vogel. Automated MPI Correctness Checking: What if There Were a Magic Option? 8th LCI ’07, South Lake Tahoe, California, USA. May 2007. http://softwarecommunity.intel.com/isn/Downloads/multicore/Krotz-Vogel_lci-hpcc-correctness.pdf
Sack, P., Bliss, B.E., Ma, Z., Petersen, P., Torrellas, J.: Accurate and efficient filtering for the intel thread checker race detector. In: ASID ’06: Proceedings of the 1st workshop on Architectural and system support for improving software dependability, New York, NY, USA, ACM (2006) 34–41
A. Tirado-Ramos, H. Ragas, D. Shamonin, H. Rosmanith, and D. Kranzlmueller. Integration of blood flow visualization on the grid: the flowfish/gvk approach. In 2nd European Across Grids Conference, Nicosia, Cyprus, January 28-30 2004.
ParMA: Parallel Programming for Multi-core Architectures - ITEA2 Project (06015). http://www.parma-itea2.org/
Bettina Krammer and Rainer Keller. The ParMA Project. inSiDE, Vol 5, No. 1, Spring 2007.
Interactive European Grid. http://www.interactive-grid.eu/
S. Jimenez, V. Martin-Mayor, S. Perez-Gaviro. Rejuvenation and Memory in model Spin Glasses in 3 and 4 dimensions. Phys. Rev. B 72, 054417 (2005).
I. Campos, M. Cotallo-Aban, V. Martin-Mayor, S. Perez-Gaviro, A. Tarancon. Phys. Rev. Lett. 97, 217204 (2006).
M.S. Müller, M. van Waveren, R. Liebermann, B. Whitney, H. Saito, K. Kalyan, J. Baron, B. Brantley, Ch. Parrott, T. Elken, H. Feng and C. Ponder SPEC MPI2007 - An Application Benchmark for Clusters and HPC systems In Proceedings of ISC2007, Dresden, 2007.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Krammer, B., Hilbrich, T., Himmler, V., Czink, B., Dichev, K., Müller, M.S. (2008). MPI Correctness Checking with Marmot. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds) Tools for High Performance Computing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68564-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-68564-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68561-6
Online ISBN: 978-3-540-68564-7
eBook Packages: Computer ScienceComputer Science (R0)