Abstract
This paper describes the motivation, design and performance of EventSpace, a configurable data collecting, management and observation system used for monitoring low-level synchronization and communication behavior of parallel applications on clusters and multi-clusters. Event collectors detect events, create virtual events by recording timestamped data about the events, and then store the virtual events to a virtual event space. Event scopes provide different views of the application, by combining and pre-processing the extracted virtual events. Online monitors are implemented as consumers using one or more event scopes. Event collectors, event scopes, and the virtual event space can be configured and mapped to the available resources to improve monitoring performance or reduce perturbation. Experiments demonstrate that a wind-tunnel application instrumented with event collectors, has insignificant slowdown due to data collection, and that monitors can reconfigure event scopes to trade-off between monitoring performance and perturbation.
This research was supported in part by the Norwegian Science Foundation project “NOTUR”, sub-project “Emerging Technologies – Cluster”
Chapter PDF
Similar content being viewed by others
References
Bjørndalen, J.M., Anshus, O., Larsen, T., Vinter, B.: Paths - integrating the principles of method-combination and remote procedure calls for run-time configuration and tuning of high-performance distributed application. Norsk Informatikk Konferanse (2001)
Bjørndalen, J.M., Anshus, O., Vinter, B., Larsen, T.: Configurable collective communication in LAM-MPI. In: Proceedings of Communicating Process Architectures 2002, Reading, UK (2002)
Bongo, L.A., Anshus, O., Bjørndalen, J.M.: Using a virtual event space to understand parallel application communication behavior, Technical Report 2003-44, Department of Computer Science, University of Tromsø (2003)
Carriero, N., Gelernter, D.: Linda in context. Commun. ACM 32, 4 (1989)
Dinda, P., Gross, T., Karrer, R., Lowekamp, B., Miller, N., Steenkiste, P., Sutherland, D.: The architecture of the Remos system. In: Proc. 10th IEEE Symp. on High Performance Distributed Computing (2001)
Moore, S., Cronk, D., London, K., Dongarra, J.: Review of performance analysis tools for MPI parallel programs. In: Cotronis, Y., Dongarra, J. (eds.) PVM/MPI 2001. LNCS, vol. 2131, p. 241. Springer, Heidelberg (2001)
Ribler, R.L., Vetter, J.S., Simitci, H., Reed, D.A.: Autopilot: Adaptive control of distributed applications. In: Proc. of the 7th IEEE International Symposium on High Performance Distributed Computing (1998)
Tierney, B., Aydt, R., Gunter, D., Smith, W., Taylor, V., Wolski, R., Swany, M.: A grid monitoring architecture. Tech. Rep. GWD-PERF-16-2, Global Grid Forum, January 2002 (2002)
Tierney, B., Crowley, B., Gunter, D., Holding, M., Lee, J., Thompson, M.: A monitoring sensor management system for Grid environments. In: Proc. 9th IEEE Symp. On High Performance Distributed Computing (2000)
Tierney, B., Johnston, W.E., Crowley, B., Hoo, G., Brooks, C., Gunter, D.: The NetLogger methodology for high performance distributed systems performance analysis. In: Proc. 7th IEEE Symp. On High Performance Distributed Computing (1998)
Vetter, J.S., Yoo, A.: An empirical performance evaluation of scalable scientific applications. In: Proceedings of the 2002 ACM/IEEE conference on Supercomputing (2002)
Vinter, B.: PastSet a Structured Distributed Shared Memory System. PhD thesis, University of Tromsø (1999)
Wolski, R., Spring, N.T., Hayes, J.: The network weather service: a distributed resource performance forecasting service for metacomputing. Future Generation Computer Systems 15, 5–6 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bongo, L.A., Anshus, O.J., Bjørndalen, J.M. (2003). EventSpace – Exposing and Observing Communication Behavior of Parallel Cluster Applications. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds) Euro-Par 2003 Parallel Processing. Euro-Par 2003. Lecture Notes in Computer Science, vol 2790. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45209-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-45209-6_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40788-1
Online ISBN: 978-3-540-45209-6
eBook Packages: Springer Book Archive