Abstract
Since large parallel machines are typically clusters of multicore nodes, parallel programs should be able to deal with both shared memory and distributed memory. This paper proposes a hybrid work stealing scheme, which combines the lifeline-based variant of distributed task pools with the node-internal load balancing of Java’s Fork/Join framework. We implemented our scheme by extending the APGAS library for Java, which is a branch of the X10 project. APGAS programmers can now spawn locality-flexible tasks with a new asyncAny construct. These tasks are transparently mapped to any resource in the overall system, so that the load is balanced over both nodes and cores. Unprocessed asyncAny-tasks can also be cancelled. In performance measurements with up to 144 workers on up to 12 nodes, we observed near linear speedups for four benchmarks and a low overhead for cancellation-related bookkeeping.
Similar content being viewed by others
References
Applegate DL, Bixby RE, Chvatal V, Cook WJ (2007) The traveling salesman problem. Princeton University Press, Princeton
Diaz J, Munoz-Caro C, Nino A (2012) A survey of parallel programming models and tools in the multi and many-core era. IEEE Trans Parallel Distrib Syst 23:1369–1386. https://doi.org/10.1109/tpds.2011.308
Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35. https://doi.org/10.2307/3033543
Gendron B, Crainic TG (1994) Parallel branch-and-branch algorithms: survey and synthesis. Oper Res 42(6):1042–1066. https://doi.org/10.1287/opre.42.6.1042
Gik J (1987) Schach und Mathematik. Deutsch Harri GmbH, Frankfurt a. M
Guo Y, Zhao J, Cave V, Sarkar V (2010) SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems. In: Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. https://doi.org/10.1145/1693453.1693504
IBM: Core implementation of X10 programming language including compiler, runtime, class libraries, sample programs and test suite. https://github.com/x10-lang/x10 (2017)
IBM: The APGAS library for fault-tolerant distributed programming in Java 8. https://github.com/x10-lang/apgas (2017)
Kestor G, Krishnamoorthy S, Ma W (2017) Localized fault recovery for nested fork-join programs. In: Proceedings of IEEE International Symposium on Parallel and Distributed Processing. https://doi.org/10.1109/ipdps.2017.75
Kolesnichenko A, Nanz S, Meyer B (2013) How to cancel a task. Springer, Berlin, pp 61–72. https://doi.org/10.1007/978-3-642-39955-8_6
Kumar V, Murthy K, Sarkar V, Zheng Y (2016) Optimized distributed work-stealing. In: Proceedings of Workshop on Irregular Applications: Architectures and Algorithms, pp 74–77. https://doi.org/10.1109/IA3.2016.19
Olivier S, Huan J, Liu J, Prins J, Dinan J, Sadayappan P, Tseng CW (2006) UTS: an unbalanced tree search benchmark. In: Languages and Compilers for Parallel Computing, pp 235–250. Springer LNCS 4382. https://doi.org/10.1007/978-3-540-72521-3_18
OpenMP ARB: OpenMP specifications. http://www.openmp.org/specifications/ (2017)
Oracle: Class ForkJoinPool. https://docs.oracle.com/javase/9/docs/api/java/util/concurrent/ForkJoinPool.html (2017)
Paudel J, Tardieu O, Amaral JN (2013) Hybrid parallel task placement in X10. In: Proceedings of ACM SIGPLAN Workshop on X10. https://doi.org/10.1145/2481268.2481277
Paudel J, Tardieu O, Amaral JN (2013) On the merits of distributed work-stealing on selective locality-aware tasks. In: Proceedings of International Conference on Parallel Processing. https://doi.org/10.1109/icpp.2013.19
Posner J, Fohry C (2016) Cooperation versus coordination for lifeline-based global load balancing in APGAS. In: Proceedings of ACM SIGPLAN Workshop on X10. https://doi.org/10.1145/2931028.2931029
Rice University: HabaneroUPC++: a Compiler-free PGAS Library. https://github.com/habanero-rice/habanero-upc (2017)
STEllAR-GROUP: HPX: The C++ standards library for parallelism and concurrency (2017). https://github.com/STEllAR-GROUP/hpx
Tardieu O (2015) The APGAS library: resilient parallel and distributed programming in Java 8. In: Proceedings of ACM SIGPLAN Workshop on X10. https://doi.org/10.1145/2771774.2771780
Thoman P, et al (2018) A taxonomy of task-based technologies for high-performance computing. In: Proceedings of International Conference Parallel Processing and Applied Mathematics (To appear)
University of Kassel: Scientific data processing. https://www.uni-kassel.de/its-handbuch/en/daten-dienste/wissenschaftliche-datenverarbeitung.html (2017)
Yamashita K, Kamada T (2016) Introducing a multithread and multistage mechanism for the global load balancing library of X10. J Inf Process 24(2):416–424. https://doi.org/10.2197/ipsjjip.24.416
Zhang W, Tardieu O, Grove D, Herta B, Kamada T, Saraswat V, Takeuchi M (2014) GLB lifeline-based global load balancing library in X10. In: Proceedings of ACM Workshop on Parallel Programming for Analytics Applications. https://doi.org/10.1145/2567634.2567639
Acknowledgements
This work is supported by the Deutsche Forschungsgemeinschaft, under Grant FO 1035/5-1.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is an extended version of Jonas Posner and Claudia Fohry: A Combination of Intra- and Inter-Place Work Stealing for the APGAS Library. Parallel Processing and Applied Mathematics Workshops (WLPP), 2017.
Rights and permissions
About this article
Cite this article
Posner, J., Fohry, C. Hybrid work stealing of locality-flexible and cancelable tasks for the APGAS library. J Supercomput 74, 1435–1448 (2018). https://doi.org/10.1007/s11227-018-2234-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2234-8