{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,1,29]],"date-time":"2024-01-29T16:06:20Z","timestamp":1706544380208},"reference-count":47,"publisher":"Cambridge University Press (CUP)","issue":"4-5","license":[{"start":{"date-parts":[[2012,8,15]],"date-time":"2012-08-15T00:00:00Z","timestamp":1344988800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Funct. Prog."],"published-print":{"date-parts":[[2012,9]]},"abstract":"Abstract<\/jats:title>Nested data-parallelism (NDP) is a language mechanism that supports programming irregular parallel applications in a declarative style. In this paper, we describe the implementation of NDP in Parallel ML (PML), which is a part of the Manticore system. One of the main challenges of implementing NDP is managing the parallel decomposition of work. If we have too many small chunks of work, the overhead will be too high, but if we do not have enough chunks of work, processors will be idle. Recently, the technique of Lazy Binary Splitting was proposed to address this problem for nested parallel loops over flat arrays. We have adapted this technique to our implementation of NDP, which uses binary trees to represent parallel arrays. This new technique, which we callLazy Tree Splitting<\/jats:italic>(LTS), has the key advantage ofperformance robustness<\/jats:italic>, i.e., it does not require tuning to get the best performance for each program. We describe the implementation of the standard NDP operations using LTS and present experimental data that demonstrate the scalability of LTS across a range of benchmarks.<\/jats:p>","DOI":"10.1017\/s0956796812000172","type":"journal-article","created":{"date-parts":[[2012,8,15]],"date-time":"2012-08-15T12:45:18Z","timestamp":1345034718000},"page":"382-438","source":"Crossref","is-referenced-by-count":10,"title":["Lazy tree splitting"],"prefix":"10.1017","volume":"22","author":[{"given":"LARS","family":"BERGSTROM","sequence":"first","affiliation":[]},{"given":"MATTHEW","family":"FLUET","sequence":"additional","affiliation":[]},{"given":"MIKE","family":"RAINEY","sequence":"additional","affiliation":[]},{"given":"JOHN","family":"REPPY","sequence":"additional","affiliation":[]},{"given":"ADAM","family":"SHAW","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2012,8,15]]},"reference":[{"key":"S0956796812000172_ref2","volume-title":"Compiling with Continuations","author":"Appel","year":"1992"},{"key":"S0956796812000172_ref32","first-page":"287","volume-title":"Conference Record of the 35th Annual ACM Symposium on Principles of Programming Languages","author":"McBride","year":"2008"},{"key":"S0956796812000172_ref27","unstructured":"Keller G. (1999) Transformation-Based Implementation of Nested Data Parallelism for Distributed Memory Machines. PhD thesis, Technische Universit\u00e4t Berlin, Berlin, Germany."},{"key":"S0956796812000172_ref9","doi-asserted-by":"publisher","DOI":"10.1145\/324133.324234"},{"key":"S0956796812000172_ref37","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-89330-1_10"},{"key":"S0956796812000172_ref41","volume-title":"Proceedings of the IEEE International Symposium on Parallel and Distributed Processing","author":"Robison","year":"2008"},{"key":"S0956796812000172_ref35","doi-asserted-by":"publisher","DOI":"10.1145\/314602.314607"},{"key":"S0956796812000172_ref21","first-page":"212","volume-title":"Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (PLDI '98), Montreal, Canada","author":"Frigo","year":"1998"},{"key":"S0956796812000172_ref20","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1145\/1411204.1411239","volume-title":"Proceedings of the 13th ACM SIGPLAN International Conference on Functional Programming, Victoria, BC, Canada","author":"Fluet","year":"2008"},{"key":"S0956796812000172_ref43","volume-title":"First Workshop on Software Tools for Multi-Core Systems","author":"So","year":"2006"},{"key":"S0956796812000172_ref24","doi-asserted-by":"publisher","DOI":"10.1017\/S0956796805005769"},{"key":"S0956796812000172_ref47","doi-asserted-by":"crossref","unstructured":"Weeks S. (2006, Sep) Whole Program Compilation in MLton. Invited talk at ML'06 workshop. Accessed January 2011. Slides available at: http:\/\/mlton.org\/pages\/References\/attachments\/060916-mlton.pdf.","DOI":"10.1145\/1159876.1159877"},{"key":"S0956796812000172_ref23","doi-asserted-by":"publisher","DOI":"10.1145\/800055.802017"},{"key":"S0956796812000172_ref1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.4380190206"},{"key":"S0956796812000172_ref25","doi-asserted-by":"publisher","DOI":"10.1017\/S0956796897002864"},{"key":"S0956796812000172_ref40","volume-title":"Semantics Engineering with Plt Redex","author":"Rainey","year":"2009"},{"key":"S0956796812000172_ref30","first-page":"8","volume-title":"Proceedings of the Glasgow Workshop on Functional Programming","author":"Loidl","year":"1995"},{"key":"S0956796812000172_ref3","doi-asserted-by":"publisher","DOI":"10.1038\/324446a0"},{"key":"S0956796812000172_ref5","volume-title":"Vector Models for Data-Parallel Computing","author":"Blelloch","year":"1990"},{"key":"S0956796812000172_ref11","first-page":"187","volume-title":"Functional Programming Languages and Computer Architecture","author":"Burton","year":"1981"},{"key":"S0956796812000172_ref28","doi-asserted-by":"crossref","first-page":"522","DOI":"10.1145\/1629911.1630048","volume-title":"Proceedings of the 46th Annual Design Automation Conference","author":"Leiserson","year":"2009"},{"key":"S0956796812000172_ref38","doi-asserted-by":"publisher","DOI":"10.1093\/mnras\/71.5.460"},{"key":"S0956796812000172_ref46","first-page":"179","volume-title":"Proceedings of the 2010 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","author":"Tzannes","year":"2010"},{"key":"S0956796812000172_ref19","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1145\/1411204.1411224","volume-title":"Proceedings of the 13th ACM SIGPLAN International Conference on Functional Programming, Victoria, BC, Canada","author":"Fluet","year":"2008"},{"key":"S0956796812000172_ref14","first-page":"10","volume-title":"Proceedings of the ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming","author":"Chakravarty","year":"2007"},{"key":"S0956796812000172_ref7","first-page":"213","volume-title":"Proceedings of the 1996 ACM SIGPLAN International Conference on Functional Programming","author":"Blelloch","year":"1996"},{"key":"S0956796812000172_ref16","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2010.31"},{"key":"S0956796812000172_ref15","doi-asserted-by":"publisher","DOI":"10.1145\/169683.174152"},{"key":"S0956796812000172_ref29","unstructured":"Leshchinskiy R. (2005) Higher-Order Nested Data Parallelism: Semantics and Implementation. PhD thesis, Technische Universit\u00e4t Berlin, Berlin, Germany."},{"key":"S0956796812000172_ref8","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1994.1038"},{"key":"S0956796812000172_ref26","unstructured":"Intel (2008) Intel Threading Building Blocks Reference Manual. Santa Clara, CA: Intel Corporation. Available at: http:\/\/www.threadingbuildingblocks.org\/."},{"key":"S0956796812000172_ref31","doi-asserted-by":"publisher","DOI":"10.1006\/jsco.1996.0038"},{"key":"S0956796812000172_ref33","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/2319.001.0001","volume-title":"The Definition of Standard ML (revised)","author":"Milner","year":"1997"},{"key":"S0956796812000172_ref12","unstructured":"Carver T. (2010, Mar) Magny-Cours and Direct Connect Architecture 2.0. Accessed January 2012. Available at: http:\/\/developer.amd.com\/documentation\/articles\/pages\/Magny-Cours-Direct-Connect-Architecture-2.0.aspx."},{"key":"S0956796812000172_ref6","doi-asserted-by":"publisher","DOI":"10.1145\/227234.227246"},{"key":"S0956796812000172_ref18","first-page":"15","volume-title":"Proceedings of the 2007 ACM SIGPLAN Workshop on ML, Victoria, BC, Canada","author":"Fluet","year":"2007"},{"key":"S0956796812000172_ref45","doi-asserted-by":"publisher","DOI":"10.1017\/S0956796897002967"},{"key":"S0956796812000172_ref4","unstructured":"Blelloch Guy E. (1990a, Nov.) Prefix Sums and Their Applications. Tech. rept. CMU-CS-90-190. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA."},{"key":"S0956796812000172_ref42","unstructured":"Scandal Project (n.d.) A Library of Parallel Algorithms Written NESL. Accessed January 2012. Available at: http:\/\/www.cs.cmu.edu\/~scandal\/nesl\/algorithms.html."},{"key":"S0956796812000172_ref13","volume-title":"Proceedings of the ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming","author":"Chakravarty","year":"2008"},{"key":"S0956796812000172_ref44","first-page":"271","volume-title":"Selected Papers of the International Conference on Fifth Generation Computer Systems","author":"Tick","year":"1993"},{"key":"S0956796812000172_ref22","unstructured":"Ghuloum A. , Sprangle E. , Fang J. , Wu G. & Zhou X. (2007, Oct) Ct: A Flexible Parallel Programming Model for Tera-Scale Architectures. Tech. rept. Intel. Accessed January 2012. Available at: http:\/\/techresearch.intel.com\/UserFiles\/en-us\/File\/terascale\/Whitepaper-Ct.pdf."},{"key":"S0956796812000172_ref34","unstructured":"MLton (n.d.) The MLton Standard ML Compiler. Accessed January 2011. Available at: http:\/\/mlton.org."},{"key":"S0956796812000172_ref10","doi-asserted-by":"publisher","DOI":"10.1002\/spe.4380251203"},{"key":"S0956796812000172_ref36","volume-title":"ID Language Reference Manual","author":"Nikhil","year":"1991"},{"key":"S0956796812000172_ref39","unstructured":"Rainey M. (2007, Jan) The Manticore Runtime Model. M.Phil. thesis, University of Chicago, Illinois, USA. Available at: http:\/\/manticore.cs.uchicago.edu."},{"key":"S0956796812000172_ref17","first-page":"37","volume-title":"Proceedings of the ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming","author":"Fluet","year":"2007"}],"container-title":["Journal of Functional Programming"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0956796812000172","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,1,26]],"date-time":"2022-01-26T16:30:11Z","timestamp":1643214611000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0956796812000172\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,8,15]]},"references-count":47,"journal-issue":{"issue":"4-5","published-print":{"date-parts":[[2012,9]]}},"alternative-id":["S0956796812000172"],"URL":"https:\/\/doi.org\/10.1017\/s0956796812000172","relation":{},"ISSN":["0956-7968","1469-7653"],"issn-type":[{"value":"0956-7968","type":"print"},{"value":"1469-7653","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,8,15]]}}}