{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,1,2]],"date-time":"2023-01-02T05:18:44Z","timestamp":1672636724473},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2019,10,11]],"date-time":"2019-10-11T00:00:00Z","timestamp":1570752000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"SERB core research","award":["CRG\/2018\/002488"]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2019,12,31]]},"abstract":"\n X10 is a partitioned global address space programming language that supports the notion of\n places<\/jats:italic>\n ; a place consists of some data and some lightweight tasks called activities. Each activity runs at a place and may invoke a place-change operation (using the at-construct) to synchronously perform some computation at another place. These place-change operations can be very expensive, as they need to copy all the required data from the current place to the remote place. However, identifying the necessary number of place-change operations and the required data during each place-change operation are non-trivial tasks, especially in the context of irregular applications (like graph applications) that contain complex code with large amounts of cross-referencing objects\u2014not all of those objects may be actually required, at the remote place. In this article, we present AT-Com, a scheme to optimize X10 code with place-change operations.\n <\/jats:p>\n \n AT-Com consists of two inter-related new optimizations: (i) AT-Opt, which minimizes the amount of data serialized and communicated during place-change operations, and (ii) AT-Pruning, which identifies\/elides redundant place-change operations and does parallel execution of place-change operations. AT-Opt uses a novel abstraction, called\n abstract-place-tree<\/jats:italic>\n , to capture place-change operations in the program. For each place-change operation, AT-Opt uses a novel inter-procedural analysis to precisely identify the data required at the remote place in terms of the variables in the current scope. AT-Opt then emits the appropriate code to copy the identified data-items to the remote place. AT-Pruning introduces a set of program transformation techniques to emit optimized code such that it avoids the redundant place-change operations. We have implemented AT-Com in the x10v2.6.0 compiler and tested it over the IMSuite benchmark kernels. Compared to the current X10 compiler, the AT-Com optimized code achieved a geometric mean speedup of 18.72\u00d7 and 17.83\u00d7 on a four-node (32 cores per node) Intel and two-node (16 cores per node) AMD system, respectively.\n <\/jats:p>","DOI":"10.1145\/3345558","type":"journal-article","created":{"date-parts":[[2019,10,11]],"date-time":"2019-10-11T14:53:33Z","timestamp":1570805613000},"page":"1-26","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Optimizing Remote Communication in X10"],"prefix":"10.1145","volume":"16","author":[{"given":"Arun","family":"Thangamani","sequence":"first","affiliation":[{"name":"IIT Madras, Chennai"}]},{"given":"V. Krishna","family":"Nandivada","sequence":"additional","affiliation":[{"name":"IIT Madras, Chennai"}]}],"member":"320","published-online":{"date-parts":[[2019,10,11]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"crossref","unstructured":"S. Agarwal R. Barik V. K. Nandivada R. K. Shyamasundar and P. Varma. 2008. Static detection of place locality and elimination of runtime checks. In APLAS G. Ramalingam (Ed.). San Francisco CA 53--74. S. Agarwal R. Barik V. K. Nandivada R. K. Shyamasundar and P. Varma. 2008. Static detection of place locality and elimination of runtime checks. In APLAS G. Ramalingam (Ed.). San Francisco CA 53--74.","DOI":"10.1007\/978-3-540-89330-1_5"},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"S. Agarwal R. Barik V. Sarkar and R. K. Shyamasundar. March 2007. May-happen-in-parallel analysis of X10 programs. In PPoPP. 183--193. DOI:https:\/\/doi.org\/10.1145\/1229428.1229471 S. Agarwal R. Barik V. Sarkar and R. K. Shyamasundar. March 2007. May-happen-in-parallel analysis of X10 programs. In PPoPP. 183--193. DOI:https:\/\/doi.org\/10.1145\/1229428.1229471","DOI":"10.1145\/1229428.1229471"},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"M. Alvanos M. Farreras E. Tiotto J. N. Amaral and X. Martorell. 2013. Improving communication in PGAS environments: Static and dynamic coalescing in UPC. In ICS. 129--138. DOI:https:\/\/doi.org\/10.1145\/2464996.2465006 M. Alvanos M. Farreras E. Tiotto J. N. Amaral and X. Martorell. 2013. Improving communication in PGAS environments: Static and dynamic coalescing in UPC. In ICS. 129--138. DOI:https:\/\/doi.org\/10.1145\/2464996.2465006","DOI":"10.1145\/2464996.2465006"},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"R. Barik and V. Sarkar. September 2009. Interprocedural load elimination for dynamic optimization of parallel programs. In PACT. 41--52. DOI:https:\/\/doi.org\/10.1109\/PACT.2009.32 R. Barik and V. Sarkar. September 2009. Interprocedural load elimination for dynamic optimization of parallel programs. In PACT. 41--52. DOI:https:\/\/doi.org\/10.1109\/PACT.2009.32","DOI":"10.1109\/PACT.2009.32"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"R. Barik J. Zhao D. Grove I. Peshansky Z. Budimlic and V. Sarkar. 2011. Communication optimizations for distributed-memory X10 programs. In IPDPS. 1101--1113. DOI:https:\/\/doi.org\/10.1109\/IPDPS.2011.105 R. Barik J. Zhao D. Grove I. Peshansky Z. Budimlic and V. Sarkar. 2011. Communication optimizations for distributed-memory X10 programs. In IPDPS. 1101--1113. DOI:https:\/\/doi.org\/10.1109\/IPDPS.2011.105","DOI":"10.1109\/IPDPS.2011.105"},{"key":"e_1_2_1_7_1","volume-title":"Legion: Expressing locality and independence with logical regions","author":"Bauer M.","year":"2012"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342007078442"},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","unstructured":"S. Chandra V. Saraswat V. Sarkar and R. Bodik. 2008. Type inference for locality analysis of distributed data structures. In PPoPP. 11--22. DOI:https:\/\/doi.org\/10.1145\/1345206.1345211 S. Chandra V. Saraswat V. Sarkar and R. Bodik. 2008. Type inference for locality analysis of distributed data structures. In PPoPP. 11--22. DOI:https:\/\/doi.org\/10.1145\/1345206.1345211","DOI":"10.1145\/1345206.1345211"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"W. Chen C. Iancu and K. Yelick. 2005. Communication optimizations for fine-grained UPC applications. In PACT. 267--278. DOI:https:\/\/doi.org\/10.1109\/PACT.2005.13 W. Chen C. Iancu and K. Yelick. 2005. Communication optimizations for fine-grained UPC applications. In PACT. 267--278. DOI:https:\/\/doi.org\/10.1109\/PACT.2005.13","DOI":"10.1109\/PACT.2005.13"},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","unstructured":"J. Choi M. Gupta M. Serrano V. C. Sreedhar and S. Midkiff. Nov 1999. Escape analysis for Java. In OOPSLA. 1--19. DOI:https:\/\/doi.org\/10.1145\/320384.320386 J. Choi M. Gupta M. Serrano V. C. Sreedhar and S. Midkiff. Nov 1999. Escape analysis for Java. In OOPSLA. 1--19. DOI:https:\/\/doi.org\/10.1145\/320384.320386","DOI":"10.1145\/320385.320386"},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"A. Georges D. Buytaert and L. Eeckhout. Oct 2007. Statistically rigorous Java performance evaluation. In OOPSLA. 57--76. DOI:https:\/\/doi.org\/10.1145\/1297027.1297033 A. Georges D. Buytaert and L. Eeckhout. Oct 2007. Statistically rigorous Java performance evaluation. In OOPSLA. 57--76. DOI:https:\/\/doi.org\/10.1145\/1297027.1297033","DOI":"10.1145\/1297105.1297033"},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"R. Ghiya and L. J. Hendren. 1996. Is it a tree a DAG or a cyclic graph? A shape analysis for heap-directed pointers in C. In POPL. 1--15. DOI:https:\/\/doi.org\/10.1145\/237721.237724 R. Ghiya and L. J. Hendren. 1996. Is it a tree a DAG or a cyclic graph? A shape analysis for heap-directed pointers in C. In POPL. 1--15. DOI:https:\/\/doi.org\/10.1145\/237721.237724","DOI":"10.1145\/237721.237724"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2014.10.010"},{"key":"e_1_2_1_15_1","unstructured":"Habanero. 2009. Habanero Java. Retrieved from http:\/\/habanero.rice.edu\/hj. Habanero. 2009. Habanero Java. Retrieved from http:\/\/habanero.rice.edu\/hj."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/135226.135230"},{"key":"e_1_2_1_17_1","doi-asserted-by":"crossref","unstructured":"S. Hiranandani K. Kennedy and C. Tseng. November 1991. Compiler optimizations for fortran D on MIMD distributed-memory machines. In SC. 86--100. DOI:https:\/\/doi.org\/10.1145\/125826.125886 S. Hiranandani K. Kennedy and C. Tseng. November 1991. Compiler optimizations for fortran D on MIMD distributed-memory machines. In SC. 86--100. DOI:https:\/\/doi.org\/10.1145\/125826.125886","DOI":"10.1145\/125826.125886"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/330643.330647"},{"key":"e_1_2_1_19_1","volume-title":"Advanced Compiler Design and Implementation","author":"Muchnick S. S."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2450136.2450138"},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","unstructured":"J. Paudel O. Tardieu and J. N. Amarai. 2014. Optimizing shared data accesses in distributed-memory X10 systems. In HiPC. 1--10. DOI:https:\/\/doi.org\/10.1109\/HiPC.2014.7116889 J. Paudel O. Tardieu and J. N. Amarai. 2014. Optimizing shared data accesses in distributed-memory X10 systems. In HiPC. 1--10. DOI:https:\/\/doi.org\/10.1109\/HiPC.2014.7116889","DOI":"10.1109\/HiPC.2014.7116889"},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","unstructured":"S. Pellegrini T. Hoefler and T. Fahringer. 2012. Exact dependence analysis for increased communication overlap. In Recent Advances in the Message Passing Interface J. L. Tr\u00e4ff S. Benkner and J. J. Dongarra (Eds.). 89--99. S. Pellegrini T. Hoefler and T. Fahringer. 2012. Exact dependence analysis for increased communication overlap. In Recent Advances in the Message Passing Interface J. L. Tr\u00e4ff S. Benkner and J. J. Dongarra (Eds.). 89--99.","DOI":"10.1007\/978-3-642-33518-1_14"},{"key":"e_1_2_1_23_1","doi-asserted-by":"crossref","unstructured":"N. Rinetzky and S. Sagiv. 2001. Interprocedural shape analysis for recursive programs. In CC. 133--149. http:\/\/dl.acm.org\/citation.cfm?id=647477.727768. N. Rinetzky and S. Sagiv. 2001. Interprocedural shape analysis for recursive programs. In CC. 133--149. http:\/\/dl.acm.org\/citation.cfm?id=647477.727768.","DOI":"10.1007\/3-540-45306-7_10"},{"key":"e_1_2_1_24_1","doi-asserted-by":"crossref","unstructured":"A. Salcianu and M. Rinard. 2001. Pointer and escape analysis for multithreaded programs. In PPoPP. 12--23. DOI:https:\/\/doi.org\/10.1145\/379539.379553 A. Salcianu and M. Rinard. 2001. Pointer and escape analysis for multithreaded programs. In PPoPP. 12--23. DOI:https:\/\/doi.org\/10.1145\/379539.379553","DOI":"10.1145\/568014.379553"},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","unstructured":"A. Sanz R. Asenjo J. Lopez R. Larrosa A. Navarro V. Litvinov S. Choi and B. L. Chamberlain. 2012. Global data re-allocation via communication aggregation in chapel. In SBAC-PAD. 235--242. DOI:https:\/\/doi.org\/10.1109\/SBAC-PAD.2012.18 A. Sanz R. Asenjo J. Lopez R. Larrosa A. Navarro V. Litvinov S. Choi and B. L. Chamberlain. 2012. Global data re-allocation via communication aggregation in chapel. In SBAC-PAD. 235--242. DOI:https:\/\/doi.org\/10.1109\/SBAC-PAD.2012.18","DOI":"10.1109\/SBAC-PAD.2012.18"},{"key":"e_1_2_1_26_1","unstructured":"V. Saraswat B. Bloom I. Peshansky O. Tardieu and D. Grove. 2016. X10 Language Specification Version 2.6.0. Retrieved from http:\/\/x10.sourceforge.net\/documentation\/languagespec\/x10-260.pdf. V. Saraswat B. Bloom I. Peshansky O. Tardieu and D. Grove. 2016. X10 Language Specification Version 2.6.0. Retrieved from http:\/\/x10.sourceforge.net\/documentation\/languagespec\/x10-260.pdf."},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"A. Sharma D. Smith J. Koehler R. Barua and M. Ferguson. 2014. Affine loop optimization based on modulo unrolling in chapel. In PGAS. Article 13 12 pages. DOI:https:\/\/doi.org\/10.1145\/2676870.2676877 A. Sharma D. Smith J. Koehler R. Barua and M. Ferguson. 2014. Affine loop optimization based on modulo unrolling in chapel. In PGAS. Article 13 12 pages. DOI:https:\/\/doi.org\/10.1145\/2676870.2676877","DOI":"10.1145\/2676870.2676877"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018. 1--15","author":"Thangamani A."},{"key":"e_1_2_1_29_1","doi-asserted-by":"crossref","unstructured":"J Whaley and M Rinard. Nov 1999. Compositional pointer and escape analysis for Java programs. In OOPSLA. 187--206. DOI:https:\/\/doi.org\/10.1145\/320384.320400 J Whaley and M Rinard. Nov 1999. Compositional pointer and escape analysis for Java programs. In OOPSLA. 187--206. DOI:https:\/\/doi.org\/10.1145\/320384.320400","DOI":"10.1145\/320385.320400"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3345558","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T07:17:20Z","timestamp":1672557440000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3345558"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,11]]},"references-count":28,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,12,31]]}},"alternative-id":["10.1145\/3345558"],"URL":"https:\/\/doi.org\/10.1145\/3345558","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10,11]]},"assertion":[{"value":"2019-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}