{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,6,18]],"date-time":"2024-06-18T00:14:59Z","timestamp":1718669699107},"reference-count":14,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2024,3,22]],"date-time":"2024-03-22T00:00:00Z","timestamp":1711065600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,3,22]],"date-time":"2024-03-22T00:00:00Z","timestamp":1711065600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"European High-Performance Computing Joint Undertaking","award":["956137"]},{"name":"University of Innsbruck and Medical University of Innsbruck"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Parallel Prog"],"published-print":{"date-parts":[[2024,6]]},"abstract":"Abstract<\/jats:title>Collective communication APIs equip MPI vendors with the necessary context to optimize cluster-wide operations on the basis of theoretical complexity models and characteristics of the involved interconnects. Modern HPC runtime systems with a programmability focus can perform dependency analysis to eliminate the need for manual communication entirely. Profiting from optimized collective routines in this context often requires global analysis of the implicit point-to-point communication pattern or tight constrains on the data access patterns allowed inside kernels. The Celerity API provides a high degree of freedom for both runtime implementors and application developers by tieing transparent work assignment to data access patterns through user-defined range-mapper functions. Canonically, data dependencies are resolved through an intra-node coherence model and inter-node point-to-point communication. This paper presents Collective Pattern Discovery (CPD), a fully distributed, coordination-free method for detecting collective communication patterns on parallelized task graphs. Through extensive scheduling and communication microbenchmarks as well as a strong scaling experiment on a compute-intensive application, we demonstrate that CPD can achieve substantial performance gains in the Celerity model.<\/jats:p>","DOI":"10.1007\/s10766-024-00767-y","type":"journal-article","created":{"date-parts":[[2024,3,22]],"date-time":"2024-03-22T08:17:08Z","timestamp":1711095428000},"page":"171-186","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Automatic Discovery of Collective Communication Patterns in Parallelized Task Graphs"],"prefix":"10.1007","volume":"52","author":[{"given":"Fabian","family":"Knorr","sequence":"first","affiliation":[]},{"given":"Philip","family":"Salzmann","sequence":"additional","affiliation":[]},{"given":"Peter","family":"Thoman","sequence":"additional","affiliation":[]},{"given":"Thomas","family":"Fahringer","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,3,22]]},"reference":[{"key":"767_CR1","doi-asserted-by":"crossref","unstructured":"Denis, A., Jeannot, E., Swartvagher, P., Thibault, S.: Using dynamic broadcasts to improve task-based runtime performances. In: Euro-Par 2020, Warsaw, Poland, August 24\u201328, 2020, Proceedings 26. pp. 443\u2013457. Springer (2020)","DOI":"10.1007\/978-3-030-57675-2_28"},{"key":"767_CR2","doi-asserted-by":"publisher","unstructured":"Grasso, I., Pellegrini, S., Cosenza, B., Fahringer, T.: libWater: Heterogeneous distributed computing made easy. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, pp. 161\u2013172. ICS \u201913, ACM, New York, NY, USA (2013). https:\/\/doi.org\/10.1145\/2464996.2465008","DOI":"10.1145\/2464996.2465008"},{"issue":"2","key":"767_CR3","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1145\/971697.602266","volume":"14","author":"A Guttman","year":"1984","unstructured":"Guttman, A.: R-trees: a dynamic index structure for spatial searching. ACM SIGMOD Rec. 14(2), 47\u201357 (1984). https:\/\/doi.org\/10.1145\/971697.602266","journal-title":"ACM SIGMOD Rec."},{"key":"767_CR4","doi-asserted-by":"publisher","unstructured":"Hoefler, T., Schneider, T.: Runtime detection and optimization of collective communication patterns. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, pp. 263\u2013272. PACT \u201912, ACM\/doi, New York, NY, USA (2012). https:\/\/doi.org\/10.1145\/2370816.2370856","DOI":"10.1145\/2370816.2370856"},{"key":"767_CR5","doi-asserted-by":"publisher","unstructured":"Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.F.: MagPIe: MPI\u2019s collective communication operations for clustered wide area systems. In: Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 131\u2013140. PPoPP \u201999, ACM, New York, NY, USA (1999). https:\/\/doi.org\/10.1145\/301104.301116","DOI":"10.1145\/301104.301116"},{"key":"767_CR6","doi-asserted-by":"crossref","unstructured":"Knorr, F., Thoman, P., Fahringer, T.: Declarative data flow in a graph-based distributed memory runtime system. Int. J. Parallel Programm. 1\u201322 (2022)","DOI":"10.21203\/rs.3.rs-2045925\/v1"},{"key":"767_CR7","doi-asserted-by":"crossref","unstructured":"Kn\u00fcpfer, A., Kranzlm\u00fcller, D., Nagel, W.E.: Detection of collective MPI operation patterns. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface: 11th European PVM\/MPI Users\u2019 Group Meeting Budapest, Hungary, September 19-22, 2004. Proceedings 11, pp. 259\u2013267. Springer (2004)","DOI":"10.1007\/978-3-540-30218-6_38"},{"key":"767_CR8","unstructured":"Majeed, M., Dastgeer, U., Kessler, C.: Cluster-SkePU: A multi-backend skeleton programming library for GPU clusters. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p.\u00a0468. Citeseer (2013)"},{"key":"767_CR9","doi-asserted-by":"crossref","unstructured":"Mamidala, A.R., Kumar, R., De, D., Panda, D.K.: MPI collectives on modern multicore clusters: Performance optimizations and communication characteristics. In: 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), pp. 130\u2013137. IEEE (2008)","DOI":"10.1109\/CCGRID.2008.87"},{"key":"767_CR10","unstructured":"Message Passing Interface Forum: MPI: A Message-Passing Interface Standard Version 4.0. https:\/\/www.mpi-forum.org\/docs\/mpi-4.0\/mpi40-report.pdf"},{"key":"767_CR11","unstructured":"Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.J.: Performance analysis of MPI collective operations. In: 19th IEEE International Parallel and Distributed Processing Symposium, pp. 8\u2013pp. IEEE (2005)"},{"key":"767_CR12","doi-asserted-by":"crossref","unstructured":"Salzmann, P., Knorr, F., Thoman, P., Gschwandtner, P., Cosenza, B., Fahringer, T.: An asynchronous dataflow-driven execution model for distributed accelerator computing. In: 2023 23rd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid). p. (to appear). IEEE (2023)","DOI":"10.1109\/CCGrid57682.2023.00018"},{"issue":"1","key":"767_CR13","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1177\/1094342005051521","volume":"19","author":"R Thakur","year":"2005","unstructured":"Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. The Int. J. High Perf. Comput. Appl. 19(1), 49\u201366 (2005)","journal-title":"The Int. J. High Perf. Comput. Appl."},{"key":"767_CR14","doi-asserted-by":"crossref","unstructured":"Thoman, P., Salzmann, P., Cosenza, B., Fahringer, T.: Celerity: High-level C++ for accelerator clusters. In: Euro-Par 2019: Parallel Processing: 25th International Conference on Parallel and Distributed Computing, G\u00f6ttingen, Germany, August 26\u201330, 2019, Proceedings 25, pp. 291\u2013303. Springer (2019)","DOI":"10.1007\/978-3-030-29400-7_21"}],"container-title":["International Journal of Parallel Programming"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10766-024-00767-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10766-024-00767-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10766-024-00767-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,17]],"date-time":"2024-06-17T15:34:32Z","timestamp":1718638472000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10766-024-00767-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,22]]},"references-count":14,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,6]]}},"alternative-id":["767"],"URL":"https:\/\/doi.org\/10.1007\/s10766-024-00767-y","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-3647738\/v1","asserted-by":"object"}]},"ISSN":["0885-7458","1573-7640"],"issn-type":[{"value":"0885-7458","type":"print"},{"value":"1573-7640","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,22]]},"assertion":[{"value":"22 November 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 February 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 March 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no financial or non-financial interests to disclose that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}